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2.8  Comparison  between  the  magnitude  of  the  total  pressure  field  (in  Sound  Pressure  Level  SPL, 
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4.4  Left:  Cost-accuracy  trade-off  of  RK  4,  to  achieve  lower  error  the  step  size  of  RK4  needs  to 
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2.2  Parameters  used  in  our  system .  44 


2.3  Performance  statistics.  Abbreviations  are  as  follows  :  “#objs.”  denotes  the  number  of 
objects  in  the  scene,  “#freq.”  is  the  number  of  frequency  samples  in  the  range  [0-1  kHz]  and 
“#srcs”  is  the  number  of  sound  sources.  For  the  precomputation  stage,  the  term  “wave  sim.”  is 
the  total  simulation  time  of  the  numerical  wave  solver  for  all  frequencies,  “per-object”  denote 
the  compute  time  for  the  per-object  transfer  function  for  all  unique  objects  and  “inter-object” 
denote  the  compute  time  for  inter-object  transfer  functions  for  all  object  pairs,  “source-field” 
is  time  to  express  each  sound  source  in  terms  of  incoming  multipoles  for  all  objects,  and 
“global-solve”  is  time  to  compute  equivalent  source  strengths  for  all  objects.  The  “wave 
sim.”  step  is  parallelized  over  all  unique  objects  whereas  the  remaining  precomputation  steps 
are  parallelized  over  all  frequencies.  The  term  “wall-elk  time”  is  the  total  wall-clock  time 
computed  by  uniformly  distributing  all  the  parallel  processes  over  all  the  cores  of  the  64-node 
cluster  with  8  cores  per  node  (512  cores  in  total).  At  runtime,  the  total  number  of  equivalent 
sources  “#  eq.  srcs”  (in  million  M),  performance  “eval.”  and  storage  requirement  “storage” 

(fixed  and  per  source  cost)  for  all  objects  for  all  frequencies  are  also  specified.  For  column 
“#objs.” ,  notation  a*  +  b*  denotes  that  first  object  has  been  instanced  a  times  and  second 
object  instanced  b  times,  but  their  per-object  transfer  functions  are  computed  only  once  for 
each  unique  object .  46 
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2.4  Runtime  memory  requirements  per  source,  for  FDTD  [145],  ARD  [124],  BEM/FMM-BEM  [92], 
and  our  ESM  technique  with  error  thresholds  a  =  15%,  =  1%  at  maximum  simulation  fre¬ 
quency  vmax  =  1018 Hz.  Refer  to  Section  2.7.3  and  Appendix  2.9.3  for  more  details .  50 

3.1  Precomputation  Performance  Statistics.  The  rows  “Building+small” ,  “Building+medium” , 
and  “Building+large”  correspond  to  scenes  with  a  building  surrounded  by  small,  medium,  and 
large  walls,  respectively.  “Reservoir”  and  “Parking”  denote  the  reservoir  and  underground 
parking  garage  scene  respectively.  For  a  scene,  “#src”  denotes  the  number  of  sound  sources 
in  the  scene,  “#freq.”  is  the  number  of  frequency  samples,  and  “#eq.  srcs”  denotes  the  num¬ 
ber  of  equivalent  sources.  The  first  part,  “Hybrid  Pressure  Solving”,  includes  all  the  steps 
required  to  compute  the  final  equivalent  source  strengths,  and  is  performed  once  for  a  given 
sound  source  and  scene  geometries.  The  second  part,  “Pressure  Evaluation”,  corresponds  to 
the  cost  of  evaluating  the  contributions  from  all  equivalent  sources  at  a  listener  position  and 
is  performed  once  for  each  listener  position.  For  the  numerical  technique,  “wave  sim.”  refers 
the  total  simulation  time  of  the  numerical  wave  solver  for  all  frequencies;  “per-object”  denotes 
the  computation  time  of  for  per-object  transfer  functions;  “inter-object”  is  the  inter-object 
transfer  functions  for  each  pair  of  objects  (including  self-inter-object  transfer  functions,  where 
the  pressure  wave  leaves  a  near-object  region  and  reflected  back  to  the  same  object);  “source 
+  global  solve”  is  the  time  to  solve  the  linear  system  to  determine  the  strengths  of  incoming 
and  outgoing  equivalent  sources.  For  the  geometric  technique,  tris”  is  the  number  of  tri¬ 
angles  in  the  scene;  “order”  denotes  the  order  of  reflections  modeled;  rays”  is  the  number 
of  rays  emitted  from  a  source  (sound  source  or  equivalent  source).  The  column  “propagation 
time”  includes  the  time  of  finding  valid  propagation  paths  and  computing  pressures  for  any 
intermediate  step  (e.g.  from  one  object  to  another  object’s  offset  surface) .  76 
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3.2  Runtime  Performance  on  a  Single  Core.  For  each  scene,  “#IR  samples”  denotes  the 

number  of  IR’s  sampled  in  the  scene  to  support  moving  listeners  or  sources;  “Memory”  shows 
the  memory  to  store  the  IR’s;  “Time”  is  the  total  running  time  needed  to  process  and  render 
each  audio  buffer .  79 

3.3  Memory  Cost  Saving.  The  memory  required  to  evaluate  pressures  at  a  given  point  of 

space.  This  corresponds  to  the  same  operation  shown  in  the  rightmost  column  of  performance 
Table.  Compared  to  standard  numerical  techniques,  our  method  provides  3  to  7  orders  of 
magnitude  of  memory  saving  on  the  benchmark  scenes .  84 

4.1  Same  quality  comparison.  A  fan  of  rays  are  traced  under  a  downward  refracting  atmo¬ 
sphere  to  a  range  of  10km.  At  each  level  of  relative  error  in  the  ray  tracing  results,  the  average 
number  of  ray  curve  segments  per  propagation  path  is  compared  with  the  average  number  of 
Runga-Kutta  4  steps.  Analytic  ray  curve  is  able  to  achieve  same  level  of  accuracy  with  much 

less  tracing  cost . 100 

4.2  Same  speed  comparison.  A  fan  of  rays  are  traced  under  a  downward  refracting  atmosphere 
to  a  range  of  10km.  With  comparable  number  of  ray  curve  segments  and  Runga-Kutta  4  steps, 
our  ray  tracer  (RT)  is  able  to  achieve  lower  levels  of  relative  error  in  ray  tracing  results  (both 

hit  points  and  path  length)  than  RK4 . 101 

4.3  Breakdown  of  ray  tracing  time:  the  computation  of  closed-form  ray  trajectory  and  closed- 
form  evaluations  of  ray  properties  for  each  segment  takes  very  little  computation  (less  than 
0.001%  of  total  frame  time)  and  thus  is  omitted  here.  The  percentage  that  each  kind  of 
computation  takes  relative  to  the  total  frame  time  is  included  in  parenthesis,  together  with 

the  number  of  ray-BV  and  ray-surface  intersections . 105 

6.1  Parameters  for  the  ARD  technique . 140 
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Chapter  1 


Introduction 


Outdoor  acoustics  simulation,  i.e. ,  computer  modeling  of  outdoor  sound  propagation,  plays  a  vital  role  in 
military,  government,  and  civilian  applications.  Outdoor  acoustics  simulation  can  be  used  to  determine  the 
vulnerability  of  battlefield  resources  in  urban  and  outdoor  warfare  scenarios.  For  example,  acoustics  predic¬ 
tion  can  be  used  to  plan  the  path  of  a  manned  or  unmanned  aerial  vehicle  over  an  urban  environment  while 
taking  into  account  its  noise  profile,  so  as  to  minimize  its  audible  range  and  probability  of  detection.  It  is 
also  useful  for  ground  troops  to  be  able  to  pinpoint  the  origin  of  gunshots  based  on  their  acoustic  signa¬ 
tures.  In  all  these  examples,  rapidly  changing  battlefield  situations  would  necessitate  rapid  prediction  using 
acoustics  simulation.  Furthermore,  these  applications  require  the  ability  to  handle  complex  and  dynamic 
environments,  in  order  to  be  deployed  in  the  field. 

As  part  of  our  STTR  Phase  II,  we  developed  outdoor  acoustics  simulation  technology  that  can  meet 
these  stringent  requirements.  Figure  1.1  provides  and  overview  of  our  system.  In  addition  to  battlefield  and 
security  applications  such  as  tactical  planning  and  sensor  optimization,  efficient  outdoor  acoustics  prediction 
is  also  needed  for  many  commercial  applications:  it  can  be  used  to  reduce  the  noise  profile  of  aircraft, 
aid  architects  in  reducing  the  impact  of  outdoor  noise  on  their  constructions,  and  help  city  planners  and 
industrial  operators  in  complying  with  noise  level  regulations. 
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Figure  1.1:  An  overview  of  our  multi-domain  acoustic  simulation  approach.  The  circles  denote  source  (S) 
and  receiver  (R),  and  the  obstacles  are  denoted  by  the  numbers  1  and  2.  The  scene  is  decomposed  into  sub- 
domains,  i.e.  boundary  regions  around  the  obstacles  and  the  free  region.  Sound  propagation  from  the  source 
to  the  boundary  regions  is  computed,  followed  by  boundary-region  computations,  free-region  computations, 
and  interface-handling  operations.  Finally,  the  sound  is  propagated  from  the  boundary  regions  to  the  receiver 
to  compute  the  final  output  response. 

1.1  Phase  II  Key  Results 

We  developed  a  novel  hybrid  sound  propagation  approach  suitable  for  modeling  acoustics  of  large,  open 
spaces  spanning  hundreds  of  meters,  with  a  small  memory  footprint  efficiently.  At  a  high-level,  the  scene 
is  decomposed  into  disjoint  rigid  objects.  Compute-intensive  wave-based  methods  are  applied  only  around 
rigid  objects,  effecient  non-linear  ray  tracing  methods  are  used  to  model  acoustics  between  the  rigid  objects 
while  taking  into  account  inhomogenous  medium  properties,  and  finally  combining  the  results  from  different 
techniques  by  coupling  them  at  the  boundary  interfaces. 

The  acoustic  behavior  of  each  object  is  captured  by  a  compact  per-object  transfer-function  relating  the 
amplitudes  of  a  set  of  incoming  equivalent  sources  to  outgoing  equivalent  sources.  Accurate  wave-based 
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methods  can  be  used  to  compute  per-object  transfer  function  as  the  size  of  the  these  objects  is  much  smaller 
compared  to  the  overall  scene.  Pairwise  acoustic  interactions  between  objects  can  be  computed  either 
analytically  for  homogenous  medium  or  using  non-linear  ray  tracing  for  non-homogenous  medium  to  yield 
compact  inter-object  transfer  functions.  The  global  sound  field  accounting  for  all  orders  of  interaction  is 
computed  using  these  transfer  functions. 

In  this  section,  we  summarize  our  key  Phase  II  accomplishments: 

•  Pre-computed  Outdoor  Acoustics  Transfer  Operators:  We  developed  novel  wave-based  sound  propaga¬ 
tion  algorithm  that  captures  acoustic  effects  such  as  high-order  diffraction  and  scattering,  using  an 
equivalent  source  formulation.  Our  technique  can  perform  accurate  sound  propagation  on  large,  open 
scenes  in  real-time,  has  a  small  memory  footprint,  and  allows  flexible  efflciency-to-accuracy  tradeoffs. 
Compared  to  directly  storing  and  convolving  wave-solver  solutions  for  auralization,  we  reduce  the 
memory  usage  more  than  100  times. 

•  Multi- domain  Outdoor  Acoustics  Simulation :  We  developed  a  novel  hybrid  technique  for  sound  prop¬ 
agation  that  combines  the  strengths  of  numerical  and  geometric  acoustic  techniques  for  the  different 
parts  of  the  domain:  the  more  accurate  and  costly  numerical  technique  is  used  to  model  wave  phenom¬ 
ena  in  near-object  regions,  while  the  more  efficient  geometric  technique  is  used  to  handle  propagation 
in  far- field  regions  and  interaction  with  the  environment.  The  sound  pressure  field  generated  by  the 
two  techniques  is  coupled  using  a  novel  two-way  coupling  procedure. 

•  Efficient  Non-Linear  Ray  Tracer.  We  developed  an  efficient  non-linear  ray  tracer  for  inhomogeneous 
media  by  tracing  analytic  ray  curves  based  on  local  media  gradients  as  primitives.  Segments  of  ray 
curves  are  computed  by  sampling  the  media  gradient  on-the-fly,  accounting  for  both  inhomogeneous 
and  moving  media  without  the  need  to  pre-compute  explicit  cell  structures.  Acceleration  based  on  BVH 
is  readily  adapted  to  speed  up  surface  intersections  of  the  ray  curves,  enabling  logarithmic  scaling  with 
scene  complexity  and  allowing  dynamic  scenes. 

•  Accuracy  Analysis  and  Validation:  We  compared  the  accuracy  of  our  efficient  pre-computed  transfer 


21 


operators,  multi-domain  outdoor  acosutics  simulation,  and  non-linear  ray  tracer  against  analytical 
solutions  and  measurement  data.  Our  results  show  good  agreement  between  simulation  and  analytical 
or  measurement  data. 

•  Performance  Analysis:  We  have  performed  performance  analysis  of  our  implementation  of  pre-computed 
transfer  operators,  muti-domain  outdoor  acoustics  simulation,  and  efficient  non-linear  ray  tracer. 

Technical  Submissions  and  Patents  The  work  performed  during  Phase  II  has  resulted  in  the  following 
papers,  which  have  either  been  accepted  for  publication  or  are  under  review: 

1.  Wave-based  sound  propagation  in  large  open  scenes  using  an  equivalent  source  formulation.  Ravish 
Mehra,  Nikunj  Raghuvanshi,  Lakulish  Antani,  Anish  Chandak,  Sean  Curtis,  and  Dinesh  Manocha. 
2013.  ACM  Trans.  Graph.  32,  2,  Article  19  (April  2013),  13  pages. 

2.  Wave-ray  coupling  for  interactive  sound  propagation  in  large  complex  scenes.  Hengchin  Yeh,  Ravish 
Mehra,  Zhirnin  Ren,  Lakulish  Antani,  Dinesh  Manocha,  and  Ming  Lin.  2013.  ACM  Trans.  Graph.  32, 
6,  Article  165  (November  2013),  11  pages. 

3.  Source  and  Listener  Directivity  for  Interactive  Wave-Based  Sound  Propagation.  Ravish  Mehra,  Lakul¬ 
ish  Antani,  Sujeong  Kim,  and  Dinesh  Manocha.  2014.  IEEE  Transactions  on  Visualization  and 
Computer  Graphics  20,  4  (April  2014),  495-503. 

4.  WAVE:  Interactive  Wave-based  Sound  Propagation  for  Virtual  Environments.  Ravish  Mehra,  Atul 
Rungta,  Abhinav  Golas,  Ming  C  Lin,  Dinesh  Manocha.  Visualization  and  Computer  Graphics,  IEEE 
Transactions  on  ,  vol.21,  no. 4,  pp. 434, 442,  April  18  2015. 

5.  Analytic  Ray  Curve  Tracing  for  Outdoor  Sound  Propagation.  Qi  Mo,  Hengchin  Yeh,  Ming  Lin,  Dinesh 
Manocha.  Under  review  (Applied  Acouistics) 

6.  Outdoor  Sound  Propagation  with  Analytic  Ray  Curve  Tracer  and  Gaussian  Beam.  Qi  Mo,  Hengchin 
Yeh,  Ming  Lin,  and  Dinesh  Manocha.  Under  review  (The  Journal  of  the  Acoustical  Society  of  America) 
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7.  High-order  diffraction  and  diffuse  reflections  for  interactive  sound  propagation  in  large  environments. 
Carl  Schissler,  Ravish  Mehra,  and  Dinesh  Manoclra.  2014.  ACM  Trans.  Graph.  33,  4,  Article  39  (July 
2014),  12  pages. 

8.  Acoustic  pulse  propagation  in  an  urban  environment  using  a  three-dimensional  numerical  simula¬ 
tion. Ravish  Mehra,  Nikunj  Raghuvanshi,  Anish  Chandak,  Donald  G.  Albert,  D.  Keith  Wilson,  Dinesh 
Manoclra.  The  Journal  of  the  Acoustical  Society  of  America,  135,  3231-3242  (2014) 

9.  Adaptive  modeling  of  details  for  physically-based  sound  synthesis  and  propagation.  Hengchin  Yeh. 
PhD  Thesis.  August,  2014. 

10.  Efficient  Techniques  for  Wave-Based  Sound  Propagation  in  Interactive  Applications.  Ravish  Mehra. 
PhD  Thesis.  Under  review. 

We  have  also  applied  for  five  patents  based  on  the  work  performed  during  our  STTR  Phase  II. 

1.2  Organization 

As  part  of  our  Phase  II  efforts,  we  have  developed  the  key  components  of  a  hybrid,  multi-domain  outdoor 
acoustics  simulation  algorithm,  significantly  advancing  the  state-of-the-art.  The  rest  of  the  report  is  organized 
as  follows:  Chapter  2  discusses  our  wave-based  sound  propagation  techniques  for  homegenous  medium  based 
on  pre-computed  acoustic  transfer  operators.  Chapter  3  discusses  our  novel  hybrid  approach  that  couples 
geometric  acoustics /non- linear  ray  tracer  and  numerical  acoustic  techniques  to  for  a  multi-domain  outdoor 
acoustics  simulator.  Chapter  4  presents  our  non-linear  ray  tracing  method  which  uses  analytic  ray  curve  as 
tracing  primitives  to  improve  the  efficiency  of  outdoor  sound  propagation  in  fully  general  settings.  Chapter  5 
and  Chapter  6  analyze  the  accuracy  of  our  algorithms. 


23 


Chapter  2 


Wave-Based  Sound  Propagation  in 
Large  Open  Scenes  using  an 
Equivalent  Source  Formulation 

2.1  Overview 

We  present  a  novel  approach  for  wave-based  sound  propagation  suitable  for  large,  open  spaces  spanning 
hundreds  of  meters,  with  a  small  memory  footprint.  The  scene  is  decomposed  into  disjoint  rigid  objects. 
The  free-held  acoustic  behavior  of  each  object  is  captured  by  a  compact  per-object  transfer-function  relating 
the  amplitudes  of  a  set  of  incoming  equivalent  sources  to  outgoing  equivalent  sources.  Pairwise  acoustic 
interactions  between  objects  are  computed  analytically  to  yield  compact  inter-object  transfer  functions.  The 
global  sound  held  accounting  for  all  orders  of  interaction  is  computed  using  these  transfer  functions.  The 
runtime  system  uses  fast  summation  over  the  outgoing  equivalent  source  amplitudes  for  all  objects  to  auralize 
the  sound  held  for  a  moving  listener  in  real-time.  We  demonstrate  realistic  acoustic  effects  such  as  diffraction, 
low-passed  sound  behind  obstructions,  focusing,  scattering,  high-order  rehections,  and  echoes,  on  a  variety 
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of  scenes. 


2.2  Introduction 

Interactive  sound  propagation  has  emerged  as  a  powerful  tool  in  computer  graphics  to  enhance  the  realism  of 
virtual  worlds  by  predicting  the  behavior  of  sound  as  it  interacts  with  the  environment  [146,  45,  94].  In  order 
to  accurately  capture  important  acoustic  phenomena  in  general  scenarios,  including  interference,  diffraction, 
scattering,  sound  focusing  (caustics),  and  higher-order  wave  effects  resulting  from  their  combination,  it  is 
important  to  develop  techniques  that  can  directly  solve  the  acoustic  wave  equation.  There  is  extensive  work 
in  scientific  computing  and  acoustics  on  numerical  methods  to  solve  the  wave  equation.  Furthermore,  there 
has  been  considerable  interest  in  developing  interactive  wave-based  techniques  to  model  free-space  sound 
radiation  [71],  first-order  scattering  from  surfaces  [156],  and  sound  propagation  for  indoor  scenes  [135,  125]. 

Large,  open  scenes,  which  arise  in  many  applications  ranging  from  games  to  training  or  simulation 
systems,  present  a  significant  challenge  for  interactive,  wave-based  sound  propagation  techniques.  State-of- 
the-art  wave  simulation  methods  can  take  hours  of  computation  and  gigabytes  of  memory  for  performing 
sound  propagation  in  indoor  scenes  such  as  concert  halls  [131,  124].  For  large,  open  scenes  spanning  hundreds 
of  meters,  it  is  challenging  to  run  these  techniques  in  real-time.  On  the  other  hand,  geometric  (ray-based) 
acoustic  techniques  can  provide  real-time  performance  for  such  environments.  However,  geometric  techniques 
are  better  suited  for  higher  frequencies  due  to  the  inherent  assumption  of  rectilinear  propagation  of  sound 
waves.  Therefore,  accurately  modeling  diffraction  and  higher-order  wave  effects  with  these  techniques  remains 
a  significant  challenge,  especially  at  low  frequencies. 

In  this  paper,  we  present  a  novel  approach  for  precomputed,  wave-based  sound  propagation  that  is 
applicable  to  large,  open  scenes.  It  is  based  on  the  equivalent  source  method ,  which  has  been  widely  studied 
for  radiation  and  scattering  problems  in  acoustics  and  electromagnetics  [40]  and  more  recently  introduced 
to  computer  graphics  [71].  Our  approach  consists  of  two  main  stages:  preprocessing  and  runtime.  During 
preprocessing,  we  decompose  the  scene  into  disjoint,  well-separated  rigid  objects.  The  acoustic  behavior  of 
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each  object,  taken  independently,  is  characterized  by  its  per-object  transfer  function  that  maps  an  arbitrary 
incident  field  on  the  object  to  the  resulting  scattered  field.  We  propose  an  equivalent  source  formulation  to 
express  this  transfer  function  as  a  compact  scattering  matrix.  Pairwise  acoustic  coupling  between  objects 
is  then  modeled  by  computing  inter-object  transfer  functions  between  all  pairs  of  objects  that  maps  the 
outgoing  scattered  field  from  one  object  to  the  incoming  field  on  another  object.  These  transfer  functions 
are  represented  compactly  by  using  the  same  equivalent  source  framework  to  yield  interaction  matrices. 
Acoustic  transfer  between  multiple  objects  can  therefore  be  represented  using  chained  multiplication  of  their 
scattering  and  interaction  matrices.  Finally,  the  acoustic  response  of  the  scene  to  a  static  source  distribution 
is  computed  by  solving  a  global  linear  system  that  accounts  for  all  orders  of  inter-object  wave  propagation. 

At  runtime,  fast  summation  over  all  outgoing  equivalent  sources  for  all  objects  is  performed  at  the  listener 
location.  The  computed  response  is  used  for  real-time  sound  rendering  for  a  moving  listener.  Multiple 
moving  sources,  with  a  static  listener,  are  handled  by  exploiting  acoustic  reciprocity.  The  runtime  memory 
and  computational  requirements  are  proportional  to  the  number  of  objects  and  their  outgoing  scattered  field 
complexity  (usually  a  few  thousand  equivalent  sources  per  frequency  for  a  few  percent  error),  instead  of  the 
volume  or  surface  area  of  the  scene.  Thus,  our  technique  takes  an  object-centric  approach  to  wave-based 
sound  propagation.  The  key  contributions  of  our  work  include: 

•  Object-based  sound  field  decomposition  using  per-object  and  inter-object  acoustic  transfer  functions  for 
enabling  real-time,  wave-based  sound  propagation  on  large,  open  scenes. 

•  Compact  per-object  transfer  using  equivalent  sources  to  model  the  scattering  behavior  of  an  object 
mapping  arbitrary  incident  fields  to  the  resultant  scattered  fields. 

•  Compact  analytical  coupling  of  objects  is  achieved  by  expressing  inter-object  transfer  functions  in  the 
same,  compact  equivalent  source  basis  as  used  for  per-object  transfer. 

•  A  fast,  memory- efficient  run-time  enables  real-time  sound  rendering,  while  requiring  only  a  few  tens 
of  megabytes  of  memory. 

Our  approach  is  well-suited  for  quick  iterations  while  authoring  scenes.  Per-object  transfer  functions, 
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which  take  a  significant  portion  of  the  precomputation  time  of  our  method,  are  independent  of  the  scene 
and  can  thus  be  stored  in  a  lookup  table.  Therefore,  adding,  deleting  or  moving  a  few  objects  in  an  existing 
scene  has  low  precomputation  overhead,  linear  in  the  number  of  objects. 

We  have  tested  our  technique  on  a  variety  of  scenarios  (see  Figure  2.1)  and  integrated  our  system  with  the 
Valve’s  Source™  game  engine  from  Half-Life  2.  Our  technique  generates  realistic  acoustic  effects  and  takes 
orders  of  magnitude  less  runtime  memory  compared  to  state-of-the-art  wave  solvers,  enabling  interactive 
performance.  To  the  best  of  our  knowledge,  it  is  the  first  real-time  technique  for  accurate,  wave-based  sound 
propagation  in  large,  open  scenes. 


Figure  2.1:  Our  algorithm  accurately  models  realistic  acoustic  effects,  such  as  diffraction,  scattering,  focusing, 
and  echoes,  in  large,  open  scenes  at  real-time  rates.  We  reduce  the  runtime  memory  usage  by  orders  of 
magnitude  compared  to  state-of-the-art  wave  solvers,  enabling  real-time,  wave-based  sound  propagation  in 
scenes  spanning  hundreds  of  meters:  a)  reservoir  scene  (from  Half-Life  2),  b)  Christmas  scene,  and  c)  desert 
scene. 


2.3  Related  work 

Our  technique  has  close  theoretical  parallels  with  prior  numerical  wave  solvers.  We  first  explore  these 
connections,  followed  by  related  work  on  interactive  geometric  and  wave-based  techniques. 

2.3.1  Numerical  Wave  Solvers 

Research  in  wave-based  acoustic  simulation  techniques  spans  a  broad  range  of  areas  such  as  noise  control, 
automotive  design,  urban  architectural  planning,  and  concert  hall  design.  Wave  solvers  can  be  classified 
into  volumetric  and  surface-based  approaches.  The  most  common  among  volumetric  techniques  are  the 
finite  element  method  (FEM)  [183,  152]  and  finite  difference  time  domain  (FDTD)  method  [178,  145,  131], 
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which  require  a  discretization  of  the  entire  volume  of  the  3D  scene.  The  compute  and  memory  usages  of 
these  methods  scale  linearly  with  the  volume  of  the  scene.  Faster  methods  like  pseudospectral  time  domain 
(PSTD)  [91]  and  adaptive  rectangular  decomposition  (ARD)  [124]  have  been  proposed  and  achieve  good 
accuracy  with  a  much  coarser  spatial  discretization.  Volumetric  techniques  are  well-suited  for  scenes  with 
high  surface  area  and  low  air  volume,  which  makes  them  highly  applicable  to  indoor  spaces. 

Surface-based  techniques  are  better  suited  for  open  scenes,  for  which  scattering  geometry  is  sparse  with 
large  regions  of  air  with  uniform  wave-propagation  speed.  The  most  common  approach  here  is  the  boundary 
element  method  (BEM)  [35]  that  expresses  the  global  acoustic  field  as  the  sum  of  elementary  radiating  fields 
from  monopole  and  dipole  sources  placed  on  a  uniform,  sub-wavelength  sampling  of  the  scene’s  surface. 
Traditional  BEM  scales  as  the  square  of  the  surface  area  but  recent  research  on  the  fast  multipole  method 
for  BEM  (FMM-BEM)[92,  54]  has  improved  the  complexity  to  linear  in  surface  area  by  creating  a  hierarchical 
clustering  of  BEM  monopoles  and  dipoles  using  an  octree,  and  approximating  their  interactions  compactly 
using  high-order  multipole  Green’s  functions.  Offline  FMM-BEM  solutions  are  infeasible  for  interactive 
applications  because  of  the  large,  dense  number  of  monopole  and  dipole  sources  in  the  final  solution  that 
need  to  be  stored  and  summed  on  the  fly. 

For  acoustic  radiation  and  scattering  problems,  an  efficient  and  powerful  surface-based  technique  is 
the  equivalent  source  method  (ESM)  [44,  84,  106,  113]  that  forms  the  basis  of  our  formulation.  Instead 
of  relying  on  a  boundary-integral  formulation,  as  BEM  does,  ESM  exploits  the  uniqueness  of  solutions 
to  the  acoustic  boundary  value  problem.  Equivalent  multipole  sources,  Green’s  functions,  are  placed  at 
variable  locations  in  space  with  the  intent  of  making  the  total  generated  held  match  boundary  conditions 
on  the  object’s  surface,  since  uniqueness  guarantees  the  correctness  of  the  solution  (Section  3  in  [104]).  The 
flexibility  of  location  results  in  fewer  multipole  sources.  The  ESM  can  yield  large  gains  in  performance  and 
memory-efficiency  for  scattering  and  radiation  problems  in  large  spaces,  and  has  been  used  widely  in  both 
acoustic  and  electromagnetic  applications  [40].  Equivalent  sources  were  introduced  to  computer  graphics  in 
the  seminal  work  of  [71]  on  sound  generation  from  vibrating  objects.  ESM  is  an  attractive  starting  point 
for  such  precomputation-based  approaches,  and  our  method,  because  it  allows  very  flexible  performance- 
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to-accuracy  tradeoffs.  More  importantly,  the  compactness  of  the  solutions  reduces  runtime  memory  and 
compute  requirements  by  a  large  factor,  making  them  amenable  to  real-time  evaluation. 

A  related  technique,  called  the  transition-matrix  method,  has  been  used  extensively  for  electromagnetic 
scattering,  and  also  developed  for  acoustics  [170].  The  method  relates  the  incoming  and  outgoing  fields  in 
the  scattering  process  in  terms  of  the  coefficients  of  a  complete  system  of  vector  basis  functions  that  are  not 
necessarily  Green’s  functions,  unlike  BEM  or  ESM. 

2.3.2  Interactive  Geometric  Techniques 

Most  current  interactive  sound  propagation  systems  are  based  on  geometric  acoustics,  which  applies  the 
high-frequency  Eikonal  (ray)  approximation  to  sound  propagation.  The  image  source  method  [6]  is  the  most 
commonly  used  geometric  technique,  and  there  has  been  much  research  on  improving  its  performance  [45]. 
However,  the  image  source  method  can  only  model  purely  specular  reflections.  Other  techniques  based  on  ray 
tracing  [83,  166,  88]  or  radiosity  [159]  have  been  developed  for  modeling  diffuse  reflections,  but  these  energy- 
based  formulations  may  not  model  phase  accurately.  Techniques  based  on  acoustic  radiance  transfer  [138,  139] 
can  model  arbitrary  surface  interactions  for  wide-band  signals,  but  cannot  accurately  model  wave  phenomena 
such  as  diffraction.  The  two  main  approaches  for  modeling  diffraction  in  a  geometric  acoustics  framework 
are  the  uniform  theory  of  diffraction  (UTD)  [157]  and  the  Biot-Tolstoy-Medwin  (BTM)  formulation  [143]. 
UTD  is  an  approximate  formulation,  while  the  BTM  yields  accurate  results  with  a  significant  performance 
cost.  Methods  based  on  image  source  gradients  [155]  and  acoustic  radiance  transfer  operators  [9]  have  been 
developed  to  interactively  model  higher-order  propagation  effects.  Recent  developments  in  fast  ray  tracing 
have  enabled  interactive  geometric  propagation  in  dynamic  scenes,  but  these  methods  only  model  first-order 
edge  diffraction  based  on  UTD  [151]. 

2.3.3  Interactive  Wave-simulation  Techniques 

In  recent  years,  we  have  seen  increasing  interest  in  developing  interactive  wave-simulation  techniques  for 
sound  propagation  in  indoor  and  outdoor  spaces.  Sound  radiation  from  a  single  vibrating  object  in  free 
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space  can  be  efficiently  modeled  using  precomputed  acoustic  transfer  [71] .  These  acoustic  transfer  functions 
approximate  the  radiation  behavior  of  a  complicated  geometry  by  expressing  it  in  terms  of  equivalent  sources, 
which  can  be  quickly  evaluated  at  runtime  to  enable  real-time  performance.  Tsingos  et  al.  [156]  solves  the 
boundary  integral  formulation  of  the  Helmholtz  equation  subject  to  the  Kirchhoff  approximation  in  real¬ 
time.  Raghuvanshi  et  al.  [125]  relies  on  a  volumetric  sampling  of  acoustic  responses  on  a  spatial  grid 
and  perceptual  encoding  based  on  the  acoustic  properties  of  indoor  spaces.  Recent  work  [135]  has  shown 
that  FDTD  simulations  can  run  in  real-time  on  the  GPU,  but  only  for  very  small  spaces  that  span  a  few 
meters  across.  We  compare  our  method  in  more  detail  with  these  closely  related  interactive  wave-simulation 
techniques  in  later  sections. 


2.4  The  Equivalent  Source  Method 

In  this  section,  we  give  a  brief  review  of  the  Equivalent  Source  Method.  Consider  the  exterior  scattering 
problem  [154],  a  solid  three-dimensional  object  A  immersed  in  an  unbounded  air  volume  (see  Figure  2.2(a)). 
Considering  only  time-harmonic  vibrations,  with  angular  frequency  u>  and  a  homogeneous  medium  with 
constant  speed  of  sound  c,  acoustic  wave  propagation  can  be  expressed  as  a  boundary  value  problem  for  the 
Helmholtz  equation: 
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V2p- 1 — Tp  =  0  in  A+ ,  (2.1) 
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where  p  is  the  (complex-valued)  pressure  field,  A+  is  the  domain  exterior  to  the  object,  and  V2  is  the 
Laplacian  operator.  At  the  boundary  of  the  domain,  dA ,  the  pressure  is  specified  using  a  Dirichlet  boundary 
condition: 

p  =  /(x)  on  dA.  (2.2) 


To  complete  the  problem  specification,  the  behavior  of  p  at  infinity  must  be  specified,  usually  by  the  Som- 
merfeld  radiation  condition  [116]: 
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where  r  =  ||x||  is  the  distance  of  point  x  from  the  origin  and  j  =  y/—l.  The  equivalent  source  method  [104, 
106,  113]  relies  on  the  existence  of  fundamental  solutions  also  called  Green’s  functions  or  equivalent  sources 
g(x,  y),  of  the  Helmholtz  equation  (2.1)  subject  to  the  Sommerfeld  radiation  condition  (2.3)  for  all  x/y. 
An  equivalent  source  g(x,  y,)  is  the  solution  field  induced  at  any  point  x  due  to  a  point  source  located  at 
y i,  and  can  be  expressed  as  the  sum: 

L—l  l  L2 

g(x,  yj)  =  y;  y  dumtpum(x.)  =  y^d;:fcy>ifc(x),  (2.4) 

1=0  m=—l  k—  1 

where  k  is  a  generalized  index  for  ( l,m ).  The  fundamental  solution  <^;m(x)  is  the  field  due  to  a  multipole 
source  located  at  y,;,  dum  is  its  strength,  and  L  is  the  order  of  the  multipole  (L  =  1  is  just  a  monopole, 
L  =  2  includes  dipole  terms  as  well,  and  so  on).  The  held  due  to  a  multipole  located  at  point  y,;  is  defined 
as 

/o\ 

<Pilm(x)  =  Tlmh]  ' (’Wri/c)lpirn(0i,  <j>i)  (2.5) 

where  ( ri,0i,<j>i )  is  the  vector  (x  —  y.j)  expressed  in  spherical  coordinates,  h\  {wri/c)  are  the  (complex¬ 
valued)  spherical  Hankel  functions  of  the  second  kind  [2],  ripim{dii4,i)  are  the  (complex- valued)  spherical 
harmonic  functions  [63] ,  and  T is  the  (real- valued)  normalizing  factor  that  makes  the  spherical  harmonics 
orthonormal. 

The  fundamental  solutions  <pum(x)  (or  <^jfc(x))  are  used  to  solve  the  Helmholtz  equation.  Consider  the 
outgoing  scattered  held  due  to  an  object,  and  the  associated  Dirichlet  boundary  value  problem  on  dA. 
Consider  a  discrete  set  of  R  source  locations  {y,;}^!,  all  contained  in  the  interior  region  A~ .  The  total  held 
due  to  these  sources  at  any  x  G  A+  is 

R  R  L2 

P(x)  =  y*)  =  X!  X]  cikVik(x),  (2.6) 

i—1  i=  1  k—1 

where  c.n~  =  c.id ^  are  the  corresponding  strengths  of  the  equivalent  sources.  The  main  idea  of  the  ESM 
is  that  if  the  equivalent  source  strengths  Cik  and  positions  y ,  are  chosen  to  match  the  Dirichlet  boundary 
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Figure  2.2:  a)  A  diagram  illustrating  a  radiating  object  A,  its  corresponding  boundary  dA ,  exterior  region 
A+ ,  interior  region  A-,  and  the  set  of  equivalent  sources  (denoted  by  star  shapes),  b)  Classification  of 
objects  in  a  scene.  The  triangle  and  rectangle  constitute  a  single  object,  as  their  offset  surfaces  overlap.  On 
the  other  hand,  L-shaped  shapes  are  classified  as  separate  objects. 


condition  on  dA1 

R  L2 

p(x)  =  EE  CifcPifc(x)  =  /(x);  xe<9  A,  (2.7) 

i— 1  /c— 1 

then  p(x)  is  the  correct  solution  over  all  A+. 

This  process  can  also  be  used  to  represent  the  incident  field  of  an  object,  the  only  difference  being  that 
the  equivalent  sources  are  now  placed  in  the  exterior  region  A+.  Again,  by  matching  the  boundary  condition 
(2.7),  we  get  the  correct  solution  p(x)  for  all  x  in  the  interior  region  A-. 

In  practice,  the  boundary  conditions  (2.7)  can  only  be  satisfied  approximately  for  a  finite  value  of  R, 
and  the  degree  of  approximation  can  be  controlled  by  changing  R.  Since  the  strengths  of  multipoles  of 
each  source  must  be  stored  and  its  contribution  evaluated  at  runtime,  R  is  the  main  parameter  for  trading 
accuracy  for  runtime  performance  and  memory  requirements.  This  flexibility  makes  ESM  highly  suitable  for 
interactive  applications. 


2.5  Sound  propagation  using  ESM 


We  give  a  brief  overview  of  the  precomputation  and  runtime  stages  of  our  technique  (see  Figure  2.3).  Our 
formulation  is  in  the  frequency  domain.  We  construct  a  complex  frequency  response  (containing  magnitudes 
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Figure  2.3:  Overview  of  our  wave-based  sound  propagation  technique  based  on  equivalent  sources  on  a  simple 
scene  composed  of  two  objects  and  a  sound  source  (shown  with  a  red  dot).  The  magnitudes  of  pressure  fields 
are  visualized  using  the  color  scheme  shown. 


and  phases),  at  regularly  sampled  frequencies,  to  model  the  delay  information  in  the  propagated  sound. 
Thus,  the  steps  outlined  in  this  section,  except  the  offset  surface  calculation,  need  to  be  performed  for  a 
regularly  sampled  set  of  frequencies  in  the  range  [0,  i/max],  where  z'max  is  the  maximum  simulated  frequency. 
We  assume  that  the  scene  is  composed  of  static  objects.  Table  2.1  provides  a  list  of  commonly  used  symbols. 


2.5.1  Our  Approach 

Offset  surface  calculation:  In  the  preprocessing  stage,  we  classify  objects  in  the  scene  and  calculate  the 
offset  surface  for  each  object. 

Per-object  transfer  function:  For  each  object,  we  compute  a  per-object  transfer  function  that  maps  the 
incoming  field  incident  on  the  object  to  the  outgoing  scattered  field. 

Inter-object  transfer  function:  For  each  object  pair,  we  precompute  an  inter-object  transfer  function 
that  encodes  how  the  outgoing  scattered  field  of  one  object  becomes  the  incoming  field  for  the  other  object. 
Global  solve:  Based  on  the  per-object  and  inter-object  transfer  functions  and  a  sound  source  configuration, 
we  model  acoustic  interactions  between  the  objects  in  the  scene  and  solve  for  the  global  sound  field.  Thereby, 
we  compute  the  strengths  of  all  the  outgoing  scattered  field  equivalent  sources  of  all  objects. 

Run-time  pressure  evaluation:  At  runtime,  we  add  the  pressure  produced  at  the  listener  position  by 
all  outgoing  field  equivalent  sources,  for  each  frequency.  This  is  an  extremely  fast  computation,  and  can  be 
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performed  for  a  moving  listener  in  real-time. 


2.5.2  Offset  Surface  Calculation 

The  first  step  is  to  decompose  the  input  scene  into  well-separated  objects.  To  decide  if  two  objects  are 
well-separated,  we  use  the  notion  of  an  offset  surface.  The  offset  surface  is  defined  by  taking  the  constant 
offset  along  the  normal  direction  at  each  point  on  the  boundary  of  the  object.  Two  objects  are  considered 
disjoint  if  and  only  if  their  offset  surfaces  do  not  intersect.  Otherwise,  we  combine  them  and  treat  them 
as  a  single  object  (see  Figure  2.2(b)).  We  compute  the  offset  surface  of  an  object  using  distance  field  and 
the  marching  cubes  algorithm  similar  to  James  et  al.  [71].  Typical  values  of  distance  field  voxel  resolution 
h  and  offset  distance  S  are  specified  in  Table  2.2.  The  offset  surface  serves  as  the  boundary  of  the  domain 
dA.  After  decomposing  the  scene  into  well-separated  objects,  we  compute  the  scattering  properties  for  each 
object  independently. 

Symbols  Meaning 

q\n ,  q°ut  ith  Sz  jth  eq.  src  for  incoming,  outgoing  field  resp. 

Pjh*  kth  &  hth  multipole  term  of  eq.  src.  q\n  8z  q°ut  resp. 

Q,  P  number  of  incoming,  outgoing  eq.  srcs  resp. 

M,  N  order  of  incoming,  outgoing  field  multipoles  resp. 

Table  2.1:  Table  of  commonly  used  symbols. 

2.5.3  Per-object  Transfer  Function 

In  order  to  capture  an  object’s  scattering  behavior,  we  define  the  per-object  transfer  function  /,  a  function 
which  maps  an  arbitrary  incoming  field  reaching  the  object  to  the  corresponding  outgoing  scattered  field  after 
reflection,  scattering  and  diffraction  due  to  the  object  itself.  This  function  is  linear  owing  to  the  linearity  of 
the  wave  equation  and  depends  only  on  the  shape  and  material  properties  of  the  object. 

The  incoming  and  outgoing  fields  for  an  object  A  are  both  expressed  using  equivalent  sources.  The 
outgoing  field  is  represented  by  placing  equivalent  sources  {qfut ,  q^,  qfut , ...}  in  the  interior  region  A~  of 
the  object.  Similarly,  the  incoming  field  is  represented  by  placing  equivalent  sources  {qf1,  q™,  q™,  •••}  in  the 
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exterior  region  A+.  The  transfer  function  /  maps  the  basis  of  the  incoming  field  (multipoles  <plik)  to  the 
corresponding  outgoing  field  expressed  as  a  linear  combination  of  its  basis  functions  (multipoles  ) : 

( P,N 2) 

/(¥&)  =  E  <*%*»■’  (2-8) 

/«) 

/faE) 

/Hv) 

where  =  T^(i/c,j/i)  is  the  (complex) 
amplitude  incoming  multipole  The  per-object  sound  transfer  function  for  object  A  is  encoded  in  the 
coefficient  matrix  Ta ,  which  we  call  the  scattering  matrix.  We  now  explain  how  to  compute  the  (complex) 
amplitudes  alkh  of  the  outgoing  field  multipoles.  Details  on  choosing  the  number  and  positions  of  incoming 
and  outgoing  equivalent  sources  are  given  in  Section  2.5.5. 

Computing  the  Scattering  Matrix  For  each  incoming  field  multipole  < in  turn,  we  place  a  unit- 
amplitude  sound  source  and  use  a  numerical  wave  solver  to  compute  the  total  pressure  field  at  n  uniformly- 
sampled  locations  {x1;  x2, ...,  xn}  on  dA.  We  subtract  the  incident  field  from  the  total  pressure  field 
to  compute  the  outgoing  scattered  field  at  these  sampled  locations  (see  Figure  2.4),  denoted  by  pik  = 
{p(xi),p(x2),...,_p(x„)}. 

We  fit  the  outgoing  field  multipole  expansion  to  the  sampled  scattered  field,  in  a  least-squares  sense, 
by  solving  an  over-determined  linear  system  (n  >  PN2)  subject  to  a  pre-specified  error  threshold  er  for  all 
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amplitude  for  the  outgoing  multipole  induced  by  a  unit- 
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incoming  field  multipoles: 


( P,N 2) 

^2  V°jh  (xt)  afh  =  p  (xt) ,  for  t  =  1, n;  (2.10) 

0»=(i.i) 

Vaife  =  pifc.  (2.11) 

The  least-squares  solution  yields  the  coefficients  azk  corresponding  to  the  row  of  the  scattering  matrix 
T.  This  process  is  repeated  for  all  incoming  field  multipoles  to  compute  the  scattering  matrix.  The  solution 
can  be  computed  efficiently  using  a  single  combined  linear  system 
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where  Tj[  is  the  transpose  of  Ta ■  The  per-object  transfer  function  is  computed  for  all  objects  at  sampled 
frequencies.  The  error  threshold  a  is  used  while  deciding  the  number  and  placement  of  equivalent  sources 
(Section  2.5.5)  such  that  the  above  linear  system  gives  error  less  than  er. 


2.5.4  Inter-object  Transfer  Function 

Scenes  with  multiple  objects  exhibit  object-to-object  interactions,  where  the  outgoing  field  from  one  object 
serves  as  the  incoming  held  for  the  other  objects.  For  example,  with  two  objects  A  and  B ,  source  s  and 
listener  l,  the  possible  interactions  that  can  occur  from  s  to  l  are:  direct  sound  ( 0th  order)  s  — >  l,  1st  order 
s  — >  A— >  l;s  — >  B  — >  l,  2nd  order  s  — >  A  — >  B  — >  l;s  — >  B  — >  A  — >  l,  and  so  on.  We  model  these  interactions 
by  formulating  an  inter-object  transfer  function.  For  two  objects  A  and  B ,  the  inter-object  transfer  function 
expresses  the  outgoing  held  of  A  in  terms  of  the  basis  of  the  incoming  held  of  B.  Like  the  per-object 
transfer  function,  the  inter-object  transfer  function  is  also  a  linear  function.  The  inter-object  transfer  function 
gB  maps  each  basis  function  of  the  outgoing  held  of  A  (multipoles  yffl)  to  the  corresponding  incoming  held 
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Figure  2.4:  Magnitude  of  the  pressure  field  (in  Pa)  at  170  Hz  in  a  simple  scene  with  a  single  object  (rocks) 
and  a  single  sound  source  (red  dot).  The  difference  between  total  and  incident  fields  is  the  scattered  field 
(scaled  eight  times  for  visualization).  Note  the  high  amplitude  of  the  scattered  field  between  the  rocks 
representing  the  large  difference  in  incident  and  total  field  that  results  from  diffracted  occlusion. 


of  B  expressed  as  a  linear  combination  of  its  basis  functions  (multipoles  ^Jl): 


( Q,m 2) 

„B(,„out\  \  A  Qjh,An, 

9AvPjh  )  /  j  Pik^Pik'i 

(i,k)=(l,l) 


(2.13) 


37 


siKf) 

fill  ■ 

ft11 

■■  Pqm 2 

pYl 

QAivi?) 

— 

fill 

fill  • 

012 

■■  Pqm 2 

QA^Pm) 

fi™2 

fiiT2  • 

Pp  TV2 

Pqm 2 

Vqm* 

—  (~'Bi hin 


(2.14) 


where  f3jj!  =  G^{jh,ik)  is  the  (complex)  amplitude  of  the  incoming  multipole  of  B  induced  by  a  unit- 
amplitude  outgoing  multipole  of  A.  The  inter-object  transfer  function  from  A  to  B  is  thus  encoded  as 
G®,  which  we  call  the  interaction  matrix.  Generally,  the  interaction  matrix  is  not  symmetric,  i.e. ,  G^  GB. 
Since  the  outgoing  held  of  an  object  is  not  defined  in  its  interior  region,  Gj|  and  GB  are  zero  matrices.  We 
now  explain  how  to  compute  the  (complex)  amplitudes  /3)/1  of  the  incoming  Held  multipoles. 


Computing  the  Interaction  Matrix  The  interaction  matrix  G^  can  be  computed  using  a  least-squares 
formulation  similar  to  the  one  used  for  computing  scattering  matrices.  However,  the  pressure  values  at  the 
offset  surface  samples  of  B,  p  =  {p(xi),p(x2),  ...,p(x„)}  are  simpler  to  compute.  In  a  homogenous  medium, 
the  outgoing  Held  due  to  a  multipole  is  the  same  as  the  free  space  field,  for  which  analytical  expressions  exist 
(Equation  2.5).  Therefore,  we  simply  evaluate  the  analytical  expressions  of  the  outgoing  field  multipoles 
pji'f  of  A  at  the  sample  points  on  the  offset  surface  of  B.  The  resulting  linear  system  is  solved  subject  to  a 
separate  error  threshold,  : 

(Q,M2) 

Y  ^  (x‘)  Pi'k  =  P  (x‘)  >  for  1  =  !»  n ■  (2.15) 

(*,fe)=(i,i) 


Again,  this  process  is  repeated  for  each  outgoing  multipole  of  B,  and  solved  efficiently  as  a  single  combined 
linear  system: 


U  G 


Btr 

A 
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(2.16) 
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The  inter-object  transfer  functions  are  computed  for  all  object  pairs,  independently  for  each  frequency. 

2.5.5  Computing  Equivalent  Source  Positions 

Choosing  Offset  Surface  Samples  Solving  equations  (2.12)  and  (2.16)  at  frequency  v  involves  computing 
the  pressure  at  sampled  locations  {xl7  x2, ...,  x„}  on  the  offset  surface  of  each  object.  The  number  of  sampled 
locations  n  depends  on  the  spatial  variation  of  the  pressure  held,  which  in  turn,  depends  directly  on  its 
frequency  v  or  inversely  on  its  wavelength  A  since  v  =  c/A.  As  per  the  Nyquist  Theorem,  representing  a  signal 
of  frequency  v  with  a  finite  number  of  samples  requires  a  sampling  rate  of  2v.  The  spatially-varying  pressure 
held  dehned  on  the  2D  offset  surface  must  be  sampled  at  a  rate  of  2v  in  both  dimensions.  The  distance 
between  samples  become  2z//c  =  2/A.  Therefore,  we  place  n  oc  (2 v/c)2x  surface  area  =  (2/A )2x  surface  area 
samples  uniformly  on  the  offset  surface. 

Choosing  Incoming  Equivalent  Sources  Since  the  nature  of  the  incoming  held  is  not  known  a  priori, 
it  is  difficult  to  optimize  the  number  and  position  of  incoming  equivalent  sources.  We  resolve  this  problem 
by  generating  another  offset  surface  at  distance  A  >  8  from  the  object,  where  8  is  the  original  offset  surface’s 
distance,  and  placing  incoming  equivalent  sources  on  this  new  surface  (see  Table  2.2  for  the  value  of  A).  The 
number  of  incoming  equivalent  sources  Q  depends  on  the  spatial  variation  of  the  incoming  pressure  held.  As 
before,  Q  oc  (2/A )2x  surface  area  equivalent  sources  are  uniformly  placed.  This  allows  us  to  represent  the 
incoming  held  on  the  inner  offset  surface  to  good  accuracy. 

Choosing  Outgoing  Equivalent  Sources  The  number  of  outgoing  equivalent  sources  P  and  their  po¬ 
sitions  are  decided  based  on  a  multi-level  source  placement  algorithm  similar  to  James  et  al.  [71].  The 
previous  algorithm  was  designed  to  satisfy  a  single  radiating  held  j5  of  an  object  at  each  frequency.  It  places 
equivalent  sources  in  a  greedy  manner,  where  at  each  step  a  set  of  candidate  positions  %  are  ranked  based 
on  their  ability  to  reduce  the  pressure  residual  vector  r  =  p/||p||2  on  the  offset  surface.  The  best  candidate 
position  x*  is  chosen  via  the  largest  projection,  i.e. ,  x*  =  arg  maxx£xtt,  where  projection  u  =  ||({7x)i?r||2. 
The  unitary  matrix  corresponding  to  the  subspace  spanning  all  the  previously  selected  positions  is  updated. 
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The  residual  vector  is  updated  by  removing  its  component  in  that  subspace.  The  process  is  repeated  until 
the  value  of  the  residual  | |r 1 1 2  falls  below  the  error  tolerance.  The  set  of  best  candidate  positions  selected  in 
the  process  is  the  set  of  outgoing  equivalent  sources  and  its  size  gives  us  the  value  of  P. 

Our  algorithm  is  designed  to  satisfy  multiple  outgoing  radiating  fields  at  each  frequency  simultaneously. 
In  our  case,  at  each  frequency,  we  have  as  many  outgoing  radiating  fields  pn  Vqm2  as  numker 
of  incoming  multipoles  QM2 .  This  gives  us  a  vector  of  pressure  residual  vectors  r  =  [fn  .  .  .  T'qm2]  and 
a  corresponding  vector  of  projections  u  =  [«n  .  .  .  uqm2]  where  Uik  =  |  |(E4:)H?ifc||2-  We  choose  the  best 
candidate  as  the  one  that  minimizes  the  pressure  residual  of  all  outgoing  fields  simultaneously  via  a  modified 
largest  projection  x*  =  arg  max^gjjlul^.  We  update  the  unitary  matrix  and  for  each  residual  vector  we 
remove  its  component  in  the  chosen  subspace.  We  then  compute  the  value  of  the  modified  residual  1 1 d|  1 2 , 
where  d  =  [d\\  .  .  .  (Iqm2\  and  dik  =  |  Ir^A;  1 1 2 -  We  repeat  this  process  until  the  relative  value  of  the  modified 
residual  falls  below  the  error  tolerance  (a  in  our  case) .  Similar  to  the  number  of  incoming  equivalent  sources 
<5,  the  number  of  outgoing  equivalent  sources  P  also  increases  with  frequency.  But  it  strongly  depends  on 
the  shape  of  the  object  and  the  complexity  of  the  outgoing  scattered  held  that  the  object  generates.  We 
fit  as  many  equivalent  sources  as  necessary  to  satisfy  the  error  threshold.  As  the  frequency  increases,  more 
equivalent  sources  are  needed  but  the  accuracy  of  our  technique  is  maintained.  The  candidate  positions  x 
are  chosen  randomly  on  the  surface  of  the  object  in  the  same  manner  as  the  previous  algorithm.  However, 
a  minimum  distance  between  any  two  equivalent  sources  is  enforced  to  improve  the  condition  number  of 
the  system;  extremely  close  equivalent  sources  dominate  the  eigenvalues  of  the  resulting  system,  adversely 
affecting  its  condition  number.  We  choose  a  minimum  distance  of  half  the  wavelength  at  any  given  frequency. 

2.5.6  Global  Solve 

Once  the  scattering  and  interaction  matrices  are  computed,  and  the  sound  source  position  has  been  decided, 
we  solve  for  the  global  sound  field  and  compute  the  outgoing  equivalent  source  strengths  of  all  the  objects  in 
the  scene.  The  sound  source  can  be  a  point  source  or  a  complex  directional  source  (represented  as  a  set  of 
multipoles).  We  give  an  intuitive  explanation  here  for  a  simple  two-object  scene  and  the  detailed  derivation 
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can  be  found  in  Appendix  2.9.1.  For  a  scene  composed  of  multiple  objects,  we  derive  the  same  equation  with 
the  symbols  having  analogous  meanings,  as  described  in  detail  in  Appendix  2.9.2. 

Assume  the  outgoing  field  in  the  scene  is  C.  This  held  when  propagated  through  the  scene,  transferred 
via  all  possible  object  pairs  using  interaction  matrix  G,  generates  an  incoming  held  GC  that,  in  addition  to 
the  source  held  S,  generates  the  total  incoming  held  (GC  +  S)  on  the  objects.  This  incoming  held  is  then 
scattered  by  the  object,  via  scattering  matrix  T,  to  produce  an  outgoing  held  T(GC  +  S).  Under  steady 
state,  this  outgoing  held  must  equal  C.  Mathematically,  this  can  be  written  as 

C  =  T(GC  +  S)  (2.17) 

This  yields  a  linear  system  for  the  outgoing  source  strengths  for  all  objects: 

(I  —  TG)C  =  TS.  (2.18) 

This  linear  system  is  solved  for  C  at  a  regularly-sampled  set  of  frequencies.  This  step  has  to  be  repeated  for 
every  sound  source  generating  a  distinct  source  held  S.  In  the  absence  of  a  source,  the  solution  is  identically 
zero. 

2.5.7  Runtime  Computation 

At  the  end  of  the  preprocessing  stage,  we  obtain  the  outgoing  equivalent  source  strengths  for  all  objects  at 
a  regularly  sampled  set  of  frequencies  corresponding  to  each  sound  source.  During  run-time,  we  use  these 
strengths  to  compute  the  pressure  held  at  any  listener  position  x: 


p(x)  =  ^GX^t(x)  +  S(x),  (2.19) 

t=i 

where  n  is  the  number  of  objects  in  the  scene,  C^.  and  are  the  strengths  and  multipoles  of  the  outgoing 
equivalent  sources  for  object  Aj  respectively,  and  s  (x)  is  the  held  generated  by  the  sound  source.  This 
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computation  is  performed  at  a  regularly-sampled  set  of  frequencies  and  repeated  for  each  source  to  compute 
a  band-limited  frequency  response  per  source.  Evaluating  equation  2.19  for  a  new  value  of  x  is  very  efficient, 
allowing  a  moving  listener  to  be  handled  naturally  in  real-time.  Since  the  analytical  expressions  for  multipoles 
of  equivalent  sources  are  used,  the  pressure  can  be  evaluated  at  any  position  x  in  space  and  not  necessarily 
at  grid  positions.  Therefore,  no  spatial  interpolation  is  required  with  our  technique.  Unlike  grid-based 
approaches  (such  as  FDTD),  our  equivalent  source  method  is  independent  of  the  spatial  discretiziation, 
resulting  in  a  much  smoother  auralization  for  a  moving  listener. 

Our  technique  allows  auralization  in  a  scene  with  multiple  static  sources  and  a  moving  listener.  We  can 
also  handle  the  case  of  multiple  moving  sources  and  a  static  listener.  First,  we  start  with  a  scene  with  a 
static  source  and  compute  acoustic  responses  at  multiple  moving  listeners  using  our  runtime  system.  The 
principle  of  acoustic  reciprocity  states  that  we  can  reverse  the  sense  of  source  and  listener  without  changing 
the  acoustic  response  [116,  p.  195-199].  Using  this  principle,  we  now  switch  the  roles  of  source  and  listeners 
while  keeping  the  acoustic  responses  the  same.  This  gives  us  acoustic  response  for  the  case  of  multiple 
moving  sources  with  a  static  listener. 

2.6  Implementation 

In  this  section,  we  describe  the  implementation  details  of  our  technique.  Typical  parameter  values  used  in 
our  experiments  are  specified  in  Table  2.2. 

Implementation  Details  The  offset  surface  generation  code  is  written  in  C++.  When  computing  per- 
object  transfer  functions,  outgoing  scattered  fields  are  computed  on  the  offset  surface  (see  Section  2.5.3)  using 
an  efficient  GPU-based  implementation  of  the  ARD  wave-solver  [124,  100].  The  solver  treats  the  scattering 
objects  as  rigid  (with  no  transmission)  and  handles  the  material  properties  using  perfectly  matched  layer 
interfaces.  The  remaining  parts  of  the  preprocessing  stage,  solving  the  linear  system  for  per-object  transfer 
functions,  inter-object  transfer  functions,  and  equivalent  source  strengths,  are  implemented  in  MATLAB. 
The  runtime  code  is  implemented  in  C-| — b,  and  has  also  been  integrated  with  Valve’s  Source™  engine,  as 
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demonstrated  in  the  supplementary  video. 

The  timing  results  for  offset  surface  generation,  the  ARD  solver,  and  runtime  code  are  measured  on  a 
single  core  of  a  4-core  2.80  GHz  Xeon  X5560  desktop  with  4  GB  of  RAM  and  NVIDIA  GeForce  GTX  480 
GPU  with  1.5  GB  memory.  Offset  surface  generation  code  takes  <  lsec  for  each  object.  The  timing  results 
for  the  MATLAB-based  precomputation  are  measured  on  a  64-node  CPU  cluster  (Xeon  X5560  processor 
nodes,  8  cores,  2.80  GHz,  48  GB).  Detailed  statistics  are  provided  in  Table  2.3.  Precomputation  for  each 
frequency  is  performed  in  parallel  over  all  the  nodes  (and  individual  cores)  of  the  CPU  cluster.  Given  more 
nodes  on  the  cluster,  the  per-object,  inter-object,  and  source-field  computations  can  be  further  parallelized 
over  all  unique  objects,  all  object-pairs,  and  all  objects,  respectively. 

Due  to  the  computational  overhead  of  the  precomputation  stage,  we  treat  band-limited  sources  that  emit 
sound  whose  frequency  range  is  bounded  by  maximum  frequency  vmax  (see  Table  2.2),  for  the  purpose  of 
wave  simulations  (see  Section  2.5.3).  The  pressure  is  computed  at  regularly  sampled  set  of  frequencies  in 
the  range  [0,  i/max]  with  a  step  size  of  Aza  The  value  of  parameter  A v  is  4.08  Hz  for  concave,  wall,  rock,  and 
parallel  walls  scenes  and  2.04  Hz  for  desert,  reservoir,  and  Christmas  scenes. 

Handling  Ground  Reflections  To  handle  ground  reflections,  we  assume  the  ground  to  be  an  infinite 
plane.  Similar  to  the  image  source  method  [6],  we  reflect  our  equivalent  sources  about  the  ground  plane 
and  multiply  their  source  strengths  by  the  (complex)  reflection  coefficient  of  the  ground.  Since  sound  waves 
traveling  in  air  maintain  their  phase  upon  reflection  from  a  hard  surface,  we  do  not  need  to  invert  the 
strengths  of  the  equivalent  sources.  To  incorporate  last  ground  reflection,  this  step  is  performed  after  the 
“Global  solve”  step  (see  Section  2.5.6).  In  order  to  handle  all  levels  of  ground  reflections  before  that,  this  step 
needs  to  be  performed  while  computing  the  interaction  matrices  as  well  (see  Section  2.5.4).  More  accurate 
physical  models  of  ground  reflection  coefficient  based  on  Darcy’s  law  can  also  be  used  [147] .  The  assumption 
of  infinite  flat  plane  works  very  well  for  cases  where  the  size  of  the  ground  perturbations  is  smaller  than 
the  minimum  wavelength  simulated  (34cms  for  1kHz).  For  cases  where  the  ground  contains  terrain  features 
that  are  much  larger,  like  hillocks,  these  can  be  handled  as  separate  objects  in  our  ESM  framework.  Due 
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to  the  increased  number  of  objects,  the  precomputation  time  and  runtime  memory  would  increase,  but  the 


accuracy  of  our  technique  would  be  maintained. 


Parameter 

Value 

Description 

c 

340  m/s 

speed  of  sound 

Z'max 

1  kHz 

highest  frequency  simulated 

h 

c/2i/max  —  0.17  m 

voxel  resolution  of  distance  field 

8 

5 h  —  0.85  m 

inner  offset  distance 

A 

8h  —  1.36  m 

outer  offset  distance 

G 

15% 

error  threshold  for  scattering  matrix 

V 

1% 

error  threshold  for  interaction  matrix 

M,  N 

2 

order  of  incoming,  outgoing  multipoles  resp. 

Table  2.2:  Parameters  used  in  our  system. 


Spectral  Extrapolation  The  band-limited  nature  of  the  frequency  responses  of  our  technique  necessitates 
a  plausible  extrapolation  to  higher  frequencies  at  runtime.  Prior  work  on  interactive  wave-based  methods 
has  shown  that  spectral  extrapolation  techniques  can  be  used  to  produce  plausible  results  for  higher  frequen¬ 
cies  [125].  However,  using  this  method  with  our  technique  would  incur  an  extra  inverse  FFT  cost  at  every 
audio  frame  for  time-domain  processing.  Therefore,  we  implemented  a  simple,  fast  extrapolation  technique 
based  on  the  edge-diffraction  spectra  [143].  As  observed,  the  typical  edge  diffraction  spectra  are  roughly 
linear  on  a  log-log  scale.  Hence,  we  first  estimate  a  trend-line  by  a  least-squares  fit  to  the  maximas  of  the  log 
magnitude  spectrum  till  ^max.  We  then  adjust  for  the  trend,  to  create  a  flat  response,  by  multiplying  with 
the  inverse  of  the  trend  on  a  log  frequency  scale.  This  adjusted  response  is  replicated  to  higher  frequencies 
and  then  multiplied  by  the  trend  again  for  the  entire  frequency  range,  yielding  the  final  wide-band  spectrum. 
If  the  trend-line  has  positive  slope,  indicating  a  high-pass  response,  we  flatten  the  trend-line  for  frequencies 
beyond  vmax.  This  extrapolation  technique  does  not  change  the  spectrum  up  to  ^max. 

We  evaluate  our  spectral  extrapolation  technique  by  comparing  the  audio  quality  of  the  results  with  the 
wide-band  spectrum  produced  by  the  Biot-Tolstoy-Medwin  (BTM)  technique  (0-22  kHz)  as  the  ground  truth 
for  the  single,  finite-edge  scenario  created  in  the  right-angled  wall  scene.  In  the  BTM  method,  edge  diffrac¬ 
tion  impulse  responses  are  computed  by  evaluating  a  time-domain  line  integral  over  the  finite  length  of  the 
edge.  This  is  essentially  based  on  Huygens  theory,  where  a  diffracting  sound  wave  is  modeled  as  a  superposi¬ 
tion  of  an  infinite  number  of  secondary  point  sources  situated  along  the  diffracting  edge,  each  with  different 
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strengths  and  directivities.  BTM  has  been  shown  to  converge  to  the  exact  analytical  solution  for  a  simple 
scene  like  this  [143].  We  use  the  MATLAB-based  edge  diffraction  toolbox  (http://www.iet.ntnu.no/  svens- 
son/software/index.htnrl)  to  generate  the  BTM  results.  As  shown  in  the  supplementary  video,  the  final 
auralized  audio  generated  by  both  the  techniques  on  this  scene  sound  similar. 

This  spectral  extrapolation  technique  is  approximate,  and  becomes  exact  in  a  specific,  single-edge  diffrac¬ 
tion  configuration.  It  does  not  guarantee  accuracy  on  general  scenes  at  high  frequencies.  While  single-edge 
diffraction  arises  frequently  in  outdoor  scenes,  many  other  complex  configurations  also  occur,  such  as  double¬ 
diffraction  and  diffracted-reflection.  Our  extrapolation  approach  would  be  accurate  in  such  cases  only  if  the 
acoustic  response  is  dominated  by  diffraction  from  a  single  edge.  In  other  situations,  we  have  observed 
that  our  extrapolation  approach  generates  plausible  results.  A  general  spectral  extrapolation  approach  for 
band-limited  acoustic  responses  with  guarantees  on  extrapolation  error  for  arbitrary  scenes,  is  an  important 
area  for  future  research. 

Real-Time  Auralization  The  sound  sources  used  in  our  implementation  play  a  pre-recorded  audio  clip. 
Audio  is  rendered  using  FMOD,  and  is  processed  in  frames  of  1024  audio  samples,  at  a  sampling  rate  of  44.1 
kHz.  In-game  (“dry”)  audio  clips  are  pre-processed  by  computing  a  windowed  short-time  Fourier  transform 
(STFT)  on  each  frame  (Blackman  window).  The  STFTs  are  computed  on  audio  frames  after  zero-padding  by 
the  maximum  impulse  response  length  to  prevent  aliasing  artifacts.  Real-time  auralization  is  performed  using 
overlap-add  STFT  convolutions.  In  each  rendered  frame,  the  dry  audio  frame  for  each  source  is  multiplied 
in  the  frequency-domain  with  the  corresponding  frequency  response.  The  results  are  then  mixed,  and  an 
inverse  FFT  performed  on  the  mixed  audio.  Finally,  overlap  from  previous  frames  is  added  in,  and  overlap 
from  the  current  frame  is  cached  in  a  ring  buffer.  Frequency  responses  are  updated  asynchronously  from  the 
actual  convolution  processing.  Spatialization  is  achieved  by  using  a  simplified  spherical  head  model  with  two 
listeners,  one  for  each  ear.  Richer  spatialization  that  uses  geometry  information  of  an  individual  listener’s 
ears,  head,  and  shoulders  can  be  modeled  using  head  related  transfer  functions  (HRTFs),  and  can  be  easily 
integrated  in  our  approach,  but  is  computationally  more  expensive. 
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Scene 

#objs. 

#  freq. 

if  srcs 

wave  sim. 

(total, 
per  obj.) 

per-object 
(per  freq) 

inter-object 
(per  freq) 

source  field 
(per  freq, 
per  src) 

global  solve 
(per  freq, 
per  src) 

wall  elk 
time 

if  eq.  srcs 
(total, 
per  src) 

eval. 
(total, 
per  src) 

storage 
(total, 
fixed  +  per  src) 

Concave 

1 

250 

1 

80  min 

51  min 

NA 

1  min 

0.1  min 

132  min 

0.1  M 

3  ms 

(1  +  4)  MB 

Wall 

1 

250 

1 

50  min 

101  min 

NA 

3  min 

0.1  min 

154  min 

0.1  M 

4  ms 

(2  +  5)  MB 

Rock 

1 

250 

1 

80  min 

87  min 

NA 

1  min 

0.1  min 

168  min 

0.4  M 

10  ms 

(4  +  11)  MB 

Parallel 

2* 

250 

1 

50  min 

101  min 

13  min 

6  min 

1  min 

171  min 

0.2  M 

8  ms 

(4  +  10)  MB 

Desert 

4*+2* 

500 

3 

180  min 

196  min 

98  min 

9  min 

26  min 

509  min 

1.1  M 

26  ms 

(12  +  33)  MB 

Reservoir 

4*+l 

500 

2 

146  min 

224  min 

63  min 

7  min 

15  min 

455  min 

1.3  M 

33  ms 

(15  4-  41)  MB 

Christmas 

2*+2*+l 

500 

2 

297  min 

301  min 

71  min 

7  min 

18  min 

694  min 

1.5  M 

38  ms 

(18  +  47)  MB 

Table  2.3:  Performance  statistics.  Abbreviations  are  as  follows  :  “#objs.”  denotes  the  number  of  objects 
in  the  scene,  “^freq.”  is  the  number  of  frequency  samples  in  the  range  [0-1  kHz]  and  srcs”  is  the  number 
of  sound  sources.  For  the  precomputation  stage,  the  term  “wave  sim.”  is  the  total  simulation  time  of  the 
numerical  wave  solver  for  all  frequencies,  “per-object”  denote  the  compute  time  for  the  per-object  transfer 
function  for  all  unique  objects  and  “inter-object”  denote  the  compute  time  for  inter-object  transfer  functions 
for  all  object  pairs,  “source-field”  is  time  to  express  each  sound  source  in  terms  of  incoming  multipoles  for  all 
objects,  and  “global-solve”  is  time  to  compute  equivalent  source  strengths  for  all  objects.  The  “wave  sim.” 
step  is  parallelized  over  all  unique  objects  whereas  the  remaining  precomputation  steps  are  parallelized  over 
all  frequencies.  The  term  “wall-elk  time”  is  the  total  wall-clock  time  computed  by  uniformly  distributing  all 
the  parallel  processes  over  all  the  cores  of  the  64- node  cluster  with  8  cores  per  node  (512  cores  in  total).  At 
runtime,  the  total  number  of  equivalent  sources  “ #  eq.  srcs”  (in  million  M),  performance  “eval.”  and  storage 
requirement  “storage”  (fixed  and  per  source  cost)  for  all  objects  for  all  frequencies  are  also  specified.  For 
column  “#objs.” ,  notation  a*  +  b*  denotes  that  first  object  has  been  instanced  a  times  and  second  object 
instanced  b  times,  but  their  per-object  transfer  functions  are  computed  only  once  for  each  unique  object. 


Air  Absorption  High-frequency  sounds  are  absorbed  more  aggressively  by  the  atmosphere  than  low  fre¬ 
quencies.  This  frequency  dependent  air  absorption  is  currently  not  modeled  by  our  technique.  However,  it 
can  be  included  as  a  post-processing  step  to  our  auralization  pipeline.  Since  we  compute  complex  frequency 
responses  containing  phase  information,  propagation  delays  are  modeled.  These  delays  yield  the  propagation 
distances  which  can  be  used  to  calculate  and  apply  a  per-frequency  attenuation  filter  in  frequency  domain 
to  model  atmospheric  absorption. 


2.7  Results  and  Analysis 


In  this  section,  we  present  the  results  of  our  technique  on  different  scenarios,  provide  error  analysis  and 


compare  it  with  prior  work. 


2.7.1  Scenarios 


We  have  considered  a  variety  of  scenes  for  testing  our  technique.  For  auralizations  corresponding  to  the 


scenes  discussed  below,  refer  to  the  supplementary  video. 
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Single  Object  We  considered  various  objects  having  different  scattering  characteristics:  rocks,  a  wall, 
and  a  concave  reflector.  The  rocks  scatter  sound  in  all  directions  (see  Figure  2.4).  We  show  magnitude  of 
the  scattered  sound  field  for  the  wall  generated  by  our  technique  and  by  BEM  in  Figure  2.5.  As  shown  in 
the  figure,  the  wall  strongly  scatters  sound  in  the  direction  perpendicular  to  itself.  As  a  more  challenging 
scene,  the  magnitude  of  scattered  sound  field  for  a  concave  reflector  is  also  shown.  The  reflector  generates 
significant  interference  effects,  resulting  in  caustic  formation  in  the  focal  region.  This  is  clearly  captured 
by  our  technique,  as  the  high  amplitude  (red)  region  in  the  figure,  showing  that  our  technique  is  able  to 
approximate  the  phase  of  the  scattered  wave  field  with  reasonable  accuracy.  The  relative  error,  defined  in 
Figure  2.6  caption,  between  the  total  pressure  fields  generated  by  our  technique  and  by  the  BEM  technique 
is  less  than  2%  for  the  wall  and  5%  for  the  concave  reflector. 

Parallel  Buildings  This  scene  consists  of  two  buildings  situated  parallel  to  one  another.  We  show  two 
walkthroughs  of  this  scene,  with  a  flying  helicopter,  and  a  person  speaking,  respectively.  As  the  helicopter 
moves  behind  a  building,  diffraction  leads  to  a  distinct  low-pass  occlusion  effect.  The  two  walls  trap  sound 
between  them,  producing  high-order  reflections,  so  that  the  volume  of  someone  talking  between  the  buildings 
is  markedly  louder  than  someone  standing  even  slightly  to  the  side. 

Desert  This  is  a  large  scene  with  three  sound  sources  spread  throughout  the  scene:  a  jeep,  a  bird,  and 
a  radio.  As  the  listener  walks  through  the  scene,  the  sound  received  from  the  various  sources  changes 
significantly  depending  on  whether  or  not  the  listener  is  in  the  line-of-sight  of  the  source(s).  We  also 
specifically  demonstrate  the  effect  of  second-order  diffracted  occlusion  of  the  jeep  sound  around  two  buildings. 

Christmas  Town  This  scene  demonstrates  sound  propagation  in  a  village  with  many  houses,  a  church,  a 
bell  tower  and  large  buildings.  It  shows  reflection  from  buildings,  diffraction  around  houses,  sound  propa¬ 
gation  over  large  distances  from  the  bell  tower,  and  reflections  between  two  parallel  buildings,  for  multiple 
sources. 
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Reservoir  We  show  that  our  technique  can  be  integrated  with  an  existing  game  ( Half-Life  2)  to  generate 
realistic  wave  acoustic  effects  in  a  large,  outdoor  game  map.  Our  method  is  the  first  wave-based  sound 
propagation  technique  that  can  accurately  model  wave  phenomena  such  as  diffraction  behind  the  rocks  and 
scattering  around  buildings  over  large  distances  on  such  a  scene  in  real-time. 

In  our  single-object  examples,  helicopter  behind  rock,  the  occluder  is  placed  in  isolation  without  any  sur¬ 
rounding  objects.  Due  to  the  lack  of  reflections  from  surrounding  objects,  and  the  fact  that  high  frequencies 
do  exhibit  quite  sharp  shadows,  our  diffraction  and  occlusion  effects  may  sound  exaggerated,  compared  to 
real  life. 

2.7.2  Error  Analysis 

Figure  2.6  shows  the  convergence  of  our  method  as  the  error  threshold  a  decreases.  Since  the  number  of 
outgoing  equivalent  sources  is  inversely  proportional  to  a ,  it  also  shows  convergence  of  the  technique  with 
increasing  number  of  outgoing  equivalent  sources.  We  also  plot  the  variation  in  the  number  of  outgoing 
equivalent  sources  with  frequency  to  achieve  given  error  thresholds  (see  Figure  2.7).  In  Figure  2.8,  we 
compare  the  results  of  our  ESM  technique  with  the  reference  wave  solver  ARD  [124],  BEM  [35],  and  FMM- 
BEM  [92,  54]  techniques  on  a  spatial  grid  of  listeners  at  different  frequencies  for  the  two  parallel  walls  scene. 
We  used  the  state-of-the-art  FastBEM  simulator  (http://www.fastbem.com/)  for  generating  BEM  and  FMM- 
BEM  results  up  to  the  maximum  frequency  possible  (358  Hz).  This  scene  is  acoustically  very  complex,  even 
though  individual  objects  seem  simple,  since  there  are  multiple  orders  of  interaction  happening  between 
these  two  walls.  Our  approach  handles  such  effects  accurately  and  produces  good  approximations,  up  to  the 
user-specified  error  thresholds,  while  reducing  memory  usage  by  orders  of  magnitude  (see  Table  2.4). 

2.7.3  Computational  Complexity 

Consider  a  scene  with  n  objects.  To  perform  analysis  for  frequency  is,  let  the  number  of  offset  surface  samples, 
incoming  equivalent  sources  and  outgoing  equivalent  sources  at  this  frequency  be  n,  Q  and  P  respectively. 
We  assume  that  all  objects  have  equal  volume  u. 
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Pre-processing 

Scattering  Matrix:  For  each  of  the  QM 2  incoming  multipoles  of  an  object,  wave  simulations  are  performed 
and  a  dense  linear  system  of  size  n  x  PN2  is  solved  to  find  the  object’s  scattering  matrix.  The  cost  for 
each  simulation  is  ulogu,  and  the  cost  of  solving  the  linear  system1  is  nP2N4.  Hence,  the  total  cost  is 
O  ( kQM 2  ( nP2N 4  +  ulogu)). 

Interaction  Matrix:  For  every  pair  of  objects,  PN 2  linear  systems  of  size  n  x  QM 2  need  to  be  solved  to 
find  the  interaction  matrix.  In  total,  we  have  k2  object  pairs.  The  cost  of  evaluating  analytical  expressions 
for  multipole  pressure  is  0(1)  each,  and  is  dominated  by  the  cost  of  solving  the  linear  systems.  Hence  the 
total  cost  is  O  (n2PN2nQ2M4) . 

The  size  of  these  linear  systems  vary  linearly  with  n,  which  in  turn  varies  quadratically  with  frequency 
(see  Section  2.5.5).  Thus,  ensuring  a  few  hours  precomputation  time  on  a  small  computational  cluster  (see 
Table  3)  limits  our  technique  to  1-2  kHz  on  typical  outdoor  scenes. 

Computing  Strengths:  The  incoming  field  produced  by  each  sound  source  is  represented  in  terms  of  the 
incoming  equivalent  sources  of  the  objects.  This  requires  solving  k  linear  system  of  size  n  x  QM 2  resulting 
in  cost  O  (nnQ2 M4).  The  size  of  the  final  linear  system  for  finding  outgoing  equivalent  source  strengths  for 
all  objects  in  response  to  a  sound  source  is  kPN2  x  kPN2  .  Solving  it  takes  O  ( k3P3N 6)  time. 

It  follows  that  the  total  pre-processing  cost  at  frequency  v  thus  scales  as 

0(kQM2  (nP2  N4  +  ulogu 

+  kPN2uQM2  +  nQM2)  +  k3P3N6) 

Runtime  At  runtime,  we  evaluate  equation  (2.19),  which  takes  O  ( kPN 2)  at  frequency  v.  The  runtime 
memory  requirement  consists  of  positions  (3  floats)  and  (complex-valued)  strengths  (2  floats)  of  equivalent 
sources,  which  comes  out  to  be  k(3 P  +  2PN 2)  at  frequency  u. 

The  precomputation  and  runtime  complexity  and  memory  requirement  depend  on  the  number  of  equiv¬ 
alent  sources  P,  which  scales  quadratically  with  frequency,  in  an  asymptotic  sense.  However,  for  practical 

1To  solve  a  dense  linear  system  of  size  m  x  n  ( m  >  n),  the  cost  is  mn2 
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objects  and  frequencies  up  to  1  kHz,  we  observed  linear  scaling  of  equivalent  sources  with  frequency,  as 
shown  in  Figure  2.7. 

We  have  to  compute  the  pressure  at  the  listener  position  over  a  regularly  sampled  set  of  frequencies  in 
the  range  [0,  vmax\  with  a  step  of  Av.  The  total  number  of  frequency  samples  becomes  vmax/ Av.  Thus, 
the  above  expressions  are  summed  for  all  frequency  samples  in  this  range  to  give  the  total  computational 
complexity  and  memory  requirement.  Since  the  computational  cost  and  runtime  memory  scales  with  the 
multipole  order,  we  limit  equivalent  sources  to  monopoles  and  dipoles,  i.e.,  N=M= 2.  Low  multipole  orders 
(N,  M)  result  in  a  larger  number  of  equivalent  sources  for  satisfying  the  same  error  thresholds.  However, 
since  we  place  only  as  many  equivalent  sources  as  required,  low  multipole  order  does  not  effect  the  quality 


of  the  final  result.  The  theoretical  runtime  memory  requirements  for  other  wave-solvers  are  discussed  in 


Appendix  2.9.3.  We  also  compare  the  runtime  memory  requirements  of  these  solvers  with  our  technique  on 


a  variety  of  scenes  (see  Table  2.4). 


Scene 

air. 

vol. 

surf. 

area 

FDTD 

ARD 

BEM/ 

FMM 

Ours 

Concave 

(85m)3 

107  m2 

33  TB 

0.9  TB 

0.5  GB 

5  MB 

Wall 

(85m)3 

71  m2 

33  TB 

0.9  TB 

0.3  GB 

7  MB 

Rock 

(85m)3 

159  m2 

33  TB 

0.9  TB 

0.8  GB 

15  MB 

Parallel 

(85m)3 

142  m2 

33  TB 

0.9  TB 

0.7  GB 

14  MB 

Desert 

(180m)3 

1626  m2 

625  TB 

17  TB 

15  GB 

45  MB 

Reservoir 

(180m)3 

950  m2 

625  TB 

17  TB 

9  GB 

56  MB 

Christmas 

(180m)3 

2953  m2 

625  TB 

17  TB 

27  GB 

65  MB 

Table  2.4:  Runtime  memory  requirements  per  source,  for  FDTD  [145],  ARD  [124],  BEM/FMM-BEM  [92], 
and  our  ESM  technique  with  error  thresholds  cr  =  15%,  77  =  1%  at  maximum  simulation  frequency  ^max  = 
1018 Hz.  Refer  to  Section  2.7.3  and  Appendix  2.9.3  for  more  details. 


Our  precomputation  step  is  computationally  heavy  and  takes  a  few  hours  to  run  on  a  CPU-cluster 
(Table  2.3).  But  this  step  is  trivially  parallel  and  could  be  performed  easily  on  cheap  and  widely  available 
cloud  computing  resources,  such  as  Amazon  EC2.  Moreover,  our  current  implementation  is  in  MATLAB 
and  we  expect  lOx  improvement  with  an  optimized  C++  implementation. 


2.7.4  Comparison  with  Prior  Interactive  Techniques 


Our  usage  of  equivalent  sources  for  sound  propagation  is  in  a  similar  vein  to  prior  work  [71],  where  the 


authors  represent  arbitrary  outgoing  radiation  field  from  a  single,  geometrically  complex  object.  Our  work 
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differs  primarily  in  three  regards:  First,  we  model  mutual  interactions  between  objects  in  arbitrary  scenes 
using  inter-object  transfer  functions,  accounting  for  high-order  interactions,  such  as  echoes  and  multiple 
diffraction.  Secondly,  we  model  acoustic  scattering  from  objects  (as  opposed  to  radiation),  which  requires 
an  approximation  of  both  the  incoming  and  outgoing  pressure  fields  for  an  object.  Finally,  our  outgoing 
equivalent  sources  are  chosen  to  satisfy  multiple  outgoing  scattered  fields  as  opposed  to  a  single  radiation 
held. 

The  problem  of  real-time  acoustic  scattering  has  been  previously  addressed  using  GPUs  [156] .  First-order 
scattering  effects  are  incorporated,  but  acoustic  interactions  between  objects  are  not  modeled.  In  contrast, 
our  work  can  handle  all  orders  of  interactions  between  the  objects  using  inter-object  transfer  functions. 

A  recent  technique  for  interactive  acoustics  based  on  wave  simulation  was  proposed  in  Raghuvanshi  et 
al.  [125],  which  relies  on  sampling  the  volume  of  the  scene,  and  uses  a  perceptual  compression  specific  to 
indoor  scenes.  The  runtime  memory  requirement  of  their  technique  (per  source)  on  our  scenes  (assuming  a 
spatial  sampling  of  lm)  is  187  MB  for  the  parallel  walls  and  1.8  GB  for  the  reservoir  scene.  This  technique 
is  complimentary  to  our  approach;  it  works  best  in  indoor  spaces  with  a  lot  of  geometric  clutter  but  limited 
volume,  while  our  technique  is  better  suited  to  outdoor  spaces  with  well-separated  objects.  In  fact,  it  would 
be  quite  natural  to  integrate  this  method  with  ours,  with  the  indoor  and  outdoor  propagation  models  coupled 
through  transport  operators  defined  on  doors  and  windows. 

2.8  Conclusions  and  Discussion 

We  have  presented  a  novel  wave-based  sound  propagation  algorithm  that  captures  acoustic  effects  such  as 
high-order  diffraction  and  scattering,  using  an  equivalent  source  formulation.  As  a  result,  our  technique 
can  perform  accurate  sound  propagation  on  large,  open  scenes  in  real-time,  has  a  small  memory  footprint, 
and  allows  flexible  efficiency-to-accuracy  tradeoffs.  Compared  to  directly  storing  and  convolving  wave-solver 
solutions  for  auralization,  we  reduce  the  memory  usage  more  than  100  times. 

Our  approach  is  currently  limited  to  static  scenes,  due  to  the  computational  cost  of  recomputing  inter- 
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object  transfers  as  objects  move.  We  would  like  to  combine  our  approach  with  Fast  Multipole  Method 
(FMM)[92,  54]  to  accelerate  inter-object  transfer  evaluations  using  progressive  far-held  approximations. 
Moreover,  real-time  performance  could  be  achieved  by  further  using  GPU-based  dense  linear  solvers.  The 
incoming  held  strength  computation  for  a  moving  source  is  similar  to  inter-object  transfer.  Thus,  the 
combination  of  FMM  and  GPU-based  computations  could  enable  dynamic  sources  along  with  a  moving 
listener.  Also,  our  current  runtime  system  does  not  model  Doppler  effect,  which  we  would  like  to  address  in 
future  work. 

Currently,  our  sound  sources  emit  a  pre-recorded  audio  clip.  An  interesting  direction  of  future  research 
would  be  to  integrate  acoustic  radiators  based  on  mechanical  physical  models  [181,  31]  as  potential  sound 
sources,  thus  enabling  physically-based  real-time  sound  synthesis  and  propagation. 

The  computational  complexity  and  runtime  memory  requirement  of  our  technique  scale  linearly  with 
number  of  frequency  samples  which  in  turn  scales  linearly  with  the  scene  size  (number  of  frequency  samples 
oc  length  of  impulse  response  oc  scene  size).  Thus,  our  technique  can  easily  handle  scenes  that  are  hundreds 
of  meters  wide.  However,  for  massive  outdoor  scenes  that  span  kilometers,  the  runtime  memory  requirement 
would  become  too  high  (GBs  per  source).  We  plan  to  address  this  in  future  by  using  FMM-based  far  field 
approximations  to  reduce  the  number  of  equivalent  sources. 

Our  precomputation  depends  heavily  on  the  maximum  simulation  frequency  thereby  limiting  it  to  1-2 
kHz.  This  behavior  is  consistent  with  other  wave-based  techniques  like  BEM  and  FDTD,  which  are  also 
computationally  limited  to  a  few  kHz.  Geometric  approximations  become  quite  accurate  for  outdoor  scenes 
at  higher  frequencies  because  buildings  and  terrain  have  much  larger  dimensions  than  the  wavelength  of  17cm 
at  2kHz.  Thus,  hybridization  of  our  technique  with  geometric  methods  could  lead  to  accurate  wide-band 
propagation  techniques  for  open  scenes.  Hybridization  is  an  active  area  of  research  in  acoustics  [140]. 
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2.9  Appendix 


2.9.1  Two-object  Steady  State  Field 

We  describe  in  detail  the  way  we  compute  the  equivalent  source  strengths  for  a  scene  composed  of  two 
objects.  Consider  a  scene  with  objects  A  and  B  and  a  sound  source  s.  Let  the  incoming  field  multipoles 
for  A  and  B  be  and  ,  respectively.  Similarly,  let  the  multipoles  for  the  outgoing  field  for  A  and  B 
be  and  respectively.  The  scattering  matrices  for  A  and  B  are  T a  and  Tb,  respectively.  Let  the 
interaction  matrices  for  the  objects  be  G^  and  Gj),  respectively.  First  of  all,  we  express  the  incoming  field 
produced  by  sound  source  s  on  objects  A  and  B  in  terms  of  their  incoming  field  multipoles  : 

Q  M2 

An  \  ^  \  ^  _  .An  otr^in. 

SA  —  /  ^  ^  aik^ik  —  ®A  > 

i= 1  k= 1 
Q  M2 

An  \  ^  \  '  u  .An  Q tr fain 

SB  =  bikVik  =  SB  • 

i=  1  k—1 

Now  assume  that  the  steady  state  outgoing  field  of  object  A  and  B  is  P™*  and  P^ut  respectively. 

P  N2 

poAut  =  EE  4  =  (2-20) 

J'= 1  h=  1 
P  N2 

rs*  =  EE  cfhVjh  =  Cb®™*-  (2-21) 

3  = 1  h= 1 

The  outgoing  field  of  one  object  becomes  the  incoming  field  for  the  other  object.  Exploiting  the  linearity 
of  the  inter-object  transfer  function  and  (2.14),  we  find  the  incoming  field  for  B  produced  by  the  outgoing 
field  of  A  as 

P4m  ( s~itr  _  r^tr  /rym 

B  —  9a\^a^a 

Similarly,  we  find  the  incoming  field  for  A  produced  by  the  outgoing  field  of  B  as 


P4m  „A  ( s-itr  fr.out\  s~itr  /~iA  ^-.in 

a  —Qbv^b^b  j  —  ^b^b^a 
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The  total  incoming  fields  on  objects  A  and  B  are  given  by 


Pi  in  _  An  ,  T>in  _  ntr^-in  .  ritrr^AftJin. 

A  ~  SA+^A  ~  &A  ^A  +  L'B  ^B^A  i 

Pin  _  An  ,  T>in  _  otr  fain  .  /^itr  ^in 

B  ~  SB  +  —  ^B  ^B  +  '-  A  ^A  ^B  ■ 

Applying  the  linearity  of  per-object  transfer  function  f  and  using  (2.9),  we  get  outgoing  pressure  P^ut 
and  P°ut  due  to  the  scattering  of  incoming  fields  by  the  objects  as 

P°Aut  =  /( Pa )  =  (S^Ta  +  C%GiTA)*%\  (2.22) 

P°BUt  =  f(PlBn)  =  (SbTb  +  C^G^Tb)^.  (2.23) 

In  steady  state,  this  outgoing  pressure  should  match  the  outgoing  pressure  we  started  with.  Equating 
(2.22)  with  (2.20),  and  (2.23)  with  (2.21),  we  get 

Ctr  Qtrrji  ,  s~itr  /~iA  rp  . 

A  ~  ^A  1 A  -r  Ub±  A? 

Ctr  rttrrn  i  r^tr  s~i  B  rp 

B  —  &  B  1 B  +  ^  A^  A1  B- 

Combining  the  above  two  equations,  and  rearranging,  we  obtain 
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In  other  words, 

C  =  T(GC  +  S),  (2.24) 

or, 

(I  —  TG)C  =  TS,  (2.25) 
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which  is  a  linear  system  Ax  =  b.  We  solve  this  linear  system  to  get  the  outgoing  equivalent  source  strengths 
C.  At  runtime,  the  outgoing  scattered  field  at  any  listener  position  x  is  given  by 

p(x)  =  (x)  +  ClfSS^x).  (2.26) 


The  total  pressure  held  becomes 

p(x)  =  C>r  (x)  +  (x)  +  s(x).  (2.27) 

2.9.2  Multiple  Objects  Steady  State  Field 

For  a  scene  with  n  objects,  Ax,A2,  ...,An,  equation  (2.25)  remains  the  same  except  the  vectors  and  matrices 
are  generalized  for  k  objects. 

The  total  pressure  held  becomes 


P(x)=£C%*2?(x)  +  a(x).  (2.28) 

3=1 

2.9.3  Computational  Complexity 

BEM  The  storage  requirements  of  BEM  depends  on  the  total  surface  area  S  of  the  objects  in  the  scene 
and  the  number  of  frequency  samples  r'max/ At'.  Assuming  BEM  places  r  samples  per  wavelength  (usually 
r  =  12),  the  number  of  BEM  elements  placed  on  the  object’s  surface  at  frequency  sample  //,;  is  equal  to 
St2v2 /c2.  The  total  number  of  BEM  elements  for  all  the  frequency  samples  is  equal  to  S'r2r'))lax/(3c2Azz), 
where  each  element  is  specihed  by  its  position  (3  boats)  and  four  complex  amplitudes  corresponding  to 
pressure  and  its  gradient  (2  boats  each) .  Total  memory  requirement  of  storing  the  simulation  results  becomes 

llS'r2r'^lax/(3c2Ai/). 


55 


ARD  and  FDTD  The  runtime  memory  requirements  of  ARD  and  FDTD  are  equal  to  the  number  of 
grid  cells  in  the  spatial  discretization  of  the  entire  volume  of  the  scene  and  the  number  of  timesteps  in  the 
simulation.  Assuming  volume  of  the  scene  to  be  V,  the  grid  size  h,  the  maximum  frequency  t'max)  the  speed 
of  sound  c,  and  the  number  of  samples  per  wavelength  r  (equal  to  3  for  ARD  and  10  for  FDTD),  the  number 
of  grid  cells  are  (r^max/c)3F.  The  total  number  of  time  samples  to  store  is  at  least  twice  the  number  of 
samples  in  the  frequency  domain.  The  total  memory  requirement  of  storing  the  simulation  results  for  these 
techniques  is  thus 

2r34axF/(c3A^). 
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Figure  2.5:  We  show  the  scattering  behavior  of  a  wall  (2.3 m  x  4.5 m  x  3.7 to)  and  a  concave  reflector  (diameter 
8m,  thickness  1.2m)  at  160 Hz  using  our  technique  (top  row)  and  BEM  (bottom  row).  The  sound  source  is 
shown  with  a  red  dot. 
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Figure  2.6:  Convergence:  We  show  the  variation  of  error  \\Pref  —  ^5ESM||2/||^re/||2  between  the  reference 
wave  solver  and  our  technique  for  varying  values  of  scattering  matrix  error  threshold  er  for  the  two  parallel 
walls  scene  (fixed  r)  =  1%).  Pref  and  Pesm  are  vectors  consisting  of  (complex)  pressure  values  at  all  the 
receiver  locations  as  given  by  the  reference  wave  solver  and  our  technique,  respectively.  The  receivers  are 
placed  on  a  XY  grid  for  this  scene. 
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Figure  2.7:  Variation  of  the  number  of  outgoing  equivalent  sources  with  frequency,  for  four  different  objects. 
As  the  frequency  increases  (wavelength  decreases),  surface  details  of  the  size  of  the  wavelength  increase  the 
complexity  of  the  sound  field.  This  results  in  a  larger  number  of  equivalent  sources.  When  all  the  details  of 
the  object  are  captured,  increasing  the  frequency  has  little  effect  and  the  number  of  equivalent  sources  begin 
to  stabilize.  Error  thresholds  are  er  =  15%  and  rj  =  1%. 
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Figure  2.8:  Comparison  between  the  magnitude  of  the  total  pressure  field  (in  Sound  Pressure  Level  SPL, 
units  dB)  computed  by  our  ESM,  the  reference  wave-solver  ARD,  BEM  and  FMM-BEM  techniques  for  the 
two  parallel  walls  scene  on  a  XY  cutview  grid  of  listeners.  The  red  point  denotes  the  position  of  the  sound 
source.  The  error  (defined  in  Figure  2.6  caption)  between  the  ARD-ESM  fields  is  <  3%,  BEM-ESM  fields  is 
<  5%  and  FMM  BEM-ESM  fields  is  <  5%  for  the  frequencies  shown. 
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Chapter  3 


Wave-Ray  Coupling  for  Interactive 
Sound  Propagation  in  Large  Complex 
Scenes 

3.1  Overview 

We  present  a  novel  hybrid  approach  that  couples  geometric  and  numerical  acoustic  techniques  for  interactive 
sound  propagation  in  complex  environments.  Our  formulation  is  based  on  a  combination  of  spatial  and 
frequency  decomposition  of  the  sound  field.  We  use  numerical  wave-based  techniques  to  precompute  the 
pressure  field  in  the  near-object  regions  and  geometric  propagation  techniques  in  the  far-held  regions  to 
model  sound  propagation.  We  present  a  novel  two-way  pressure  coupling  technique  at  the  interface  of 
near-object  and  far-held  regions.  At  runtime,  the  impulse  response  at  the  listener  position  is  computed 
at  interactive  rates  based  on  the  stored  pressure  held  and  interpolation  techniques.  Our  system  is  able  to 
simulate  high-fidelity  acoustic  effects  such  as  diffraction,  scattering,  low-pass  filtering  behind  obstruction, 
reverberation,  and  high-order  reflections  in  large,  complex  indoor  and  outdoor  environments  and  Half-Life 
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2  game  engine.  The  pressure  computation  requires  orders  of  magnitude  lower  memory  than  standard  wave- 
based  numerical  techniques. 

3.2  Introduction 

Sound  propagation  techniques  are  used  to  model  how  sound  waves  travel  in  the  space  and  interact  with  var¬ 
ious  objects  in  the  environment.  Sound  propagation  algorithms  are  used  in  many  interactive  applications, 
such  as  computer  games  or  virtual  environments,  and  offline  applications,  such  as  noise  prediction  in  urban 
scenes,  architectural  acoustics,  virtual  prototyping,  etc..  Realistic  sound  propagation  that  can  model  differ¬ 
ent  acoustic  effects,  including  diffraction,  interference,  scattering,  and  late  reverberation,  can  considerably 
improve  a  user’s  immersion  in  an  interactive  system  and  provides  spatial  localization  [20] . 

The  acoustic  effects  can  be  accurately  simulated  by  numerically  solving  the  acoustic  wave  equation. 
Some  of  the  well-known  solvers  are  based  on  the  boundary-element  method,  the  finite-element  method, 
the  finite-difference  time-domain  method,  etc.  However,  the  time  and  space  complexity  of  these  solvers 
increases  linearly  with  the  volume  of  the  acoustic  space  and  is  a  cubic  (or  higher)  function  of  the  source 
frequency.  As  a  result,  these  techniques  are  limited  to  interactive  sound  propagation  at  low  frequencies  (e.g. 
l-2KHz)  [124,  98],  and  may  not  scale  to  large  environments. 

Many  interactive  applications  use  geometric  sound  propagation  techniques,  which  assume  that  sound 
waves  travels  like  rays.  This  is  a  valid  assumption  when  the  sound  wave  travels  in  free  space  or  when  the 
size  of  intersecting  objects  is  much  larger  than  the  wavelength.  As  a  result,  these  geometric  techniques  are 
unable  to  simulate  many  acoustic  effects  at  low  frequencies,  including  diffraction,  interference,  and  higher- 
order  wave  effects.  Many  hybrid  combinations  of  numeric  and  geometric  techniques  have  been  proposed,  but 
they  are  limited  to  small  scenes  or  offline  applications. 

Main  Results:  In  this  paper,  we  present  a  novel  hybrid  approach  that  couples  geometric  and  numerical 
acoustic  techniques  to  perform  interactive  and  accurate  sound  propagation  in  complex  scenes.  Our  approach 
uses  a  combination  of  spatial  decomposition  and  frequency  decomposition,  along  with  a  novel  two-way  wave- 
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ray  coupling  algorithm.  The  entire  simulation  domain  is  decomposed  into  different  regions,  and  the  sound 
field  is  computed  separately  by  geometric  and  numerical  techniques  for  each  region.  In  the  vicinity  of  objects 
whose  sizes  are  comparable  to  the  simulated  wavelength  (near-object  regions),  we  use  numerical  wave-based 
methods  to  simulate  all  wave  effects.  In  regions  away  from  objects  (far-held  regions),  including  the  free  space 
and  regions  containing  objects  that  are  much  larger  than  the  wavelength,  we  use  a  geometric  ray-tracing 
algorithm  to  model  sound  propagation.  We  restrict  the  use  of  numeric  propagation  techniques  to  small 
regions  of  the  environment  and  precompute  the  pressure  held  at  low  frequencies.  The  rest  of  the  pressure 
held  is  precomputed  using  ray  tracing. 

At  the  interface  between  near-object  and  far-held  regions,  we  need  to  couple  the  pressures  computed  by 
the  two  different  (one  numerical  and  one  geometric)  acoustic  techniques.  Rays  entering  a  near-object  region 
dehne  the  incident  pressure  held  that  serves  as  the  input  to  the  numerical  acoustic  solver.  The  numerical 
solver  computes  the  outgoing  scattered  pressure  held,  which  in  turn  has  to  be  represented  by  rays  exiting 
the  near-object  region.  At  the  core  of  our  hybrid  method  is  a  two-way  coupling  procedure  that  handles 
these  cases.  We  present  a  scheme  that  represents  two-way  coupling  using  transfer  functions  and  computes 
all  orders  of  interaction. 

The  key  results  of  this  work  include: 

1.  An  efficient  hybrid  approach  that  decomposes  the  scene  into  regions  that  are  more  suitable  for  either 
geometric  or  numerical  acoustic  techniques,  exploiting  the  strengths  of  both. 

2.  Novel  two-way  coupling  between  wave-based  and  ray-based  acoustic  simulation  based  on  fundamental 
solutions  at  the  interface  that  ensures  the  consistency  and  validity  of  the  solution  given  by  the  two 
methods.  Transfer  functions  are  used  to  model  two-way  couplings  to  allow  multiple  orders  of  acoustic 
interactions. 

3.  Fast,  memory- efficient  interactive  audio  rendering  that  only  uses  tens  to  hundreds  of  megabytes  of 
memory. 

We  have  tested  our  technique  on  a  variety  of  scenarios  and  integrated  our  system  with  the  Valve’s 
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Figure  3.1:  Our  hybrid  technique  is  able  to  model  high-fidelity  acoustic  effects  for  large,  complex  indoor  or 
outdoor  scenes  at  interactive  rates:  (a)  building  surrounded  by  walls,  (b)  underground  parking  garage,  and 
(c)  reservoir  scene  in  Half-Life  2. 


Source™game  engine.  Our  technique  is  able  to  handle  both  large  indoor  and  outdoor  scenes  (similar  to 
geometric  techniques)  as  well  as  generate  realistic  acoustic  effects  (similar  to  numeric  wave  solvers) ,  including 
late  reverberation,  high-order  reflections,  reverberation  coloration,  sound  focusing,  and  diffraction  low-pass 
filtering  around  obstructions.  Furthermore,  our  pressure  evaluation  takes  orders  of  magnitude  less  memory 
compared  to  state-of-the-art  wave  equation  solvers. 


3.3  Related  Work 

In  this  section,  we  present  a  brief  overview  of  related  work  on  sound  propagation  and  reverberation. 

3.3.1  Numerical  Acoustic  Techniques 

Accurate,  numerical  acoustic  simulations  typically  solve  the  acoustic  wave  equation  using  numerical  methods, 
such  as  finite  differences  [23],  finite  elements  [152],  boundary  elements,  or  adaptive  rectangular  decompo¬ 
sition  [124].  However,  their  time  and  space  complexity  increases  as  a  third  or  fourth  power  of  frequencies. 
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Despite  recent  advances,  they  remain  impractical  for  many  large  scenes. 

The  equivalent  source  [104],  expresses  the  solution  fields  of  the  wave  equation  in  terms  of  a  linear  combi¬ 
nation  of  points  sources  of  various  order  (monopoles,  dipoles,  etc).  James  et  al.  [71]  solved  a  related  sound 
radiation  problem,  using  equivalent  sources  to  represent  the  radiation  field  generated  by  a  vibrating  object. 

3.3.2  Geometric  Acoustic  Techniques 

Most  acoustics  simulation  software  and  commercial  systems  are  based  on  geometric  techniques  [45,  166] 
that  assume  sound  travels  along  linear  rays  [47].  The  simplified  assumption  of  rays  limits  these  methods  to 
accurately  capture  specular  and  diffuse  reflections  only  at  high  frequencies.  Diffraction  is  typically  modeled  by 
identifying  individual  diffracting  edges  [143,  158].  These  ray-based  techniques  can  interactively  model  early 
reflections  and  first  order  edge-diffraction  [150];  however,  they  cannot  interactively  model  the  reverberation 
of  the  impulse  response  explicitly,  since  that  would  require  high-order  reflections  and  wave  effects  such  as 
scattering,  interference,  and  diffraction.  While  ray-tracing  has  been  successfully  used  in  many  interactive 
acoustics  systems  [88] ,  the  number  of  rays  traced  has  to  be  limited  for  scenes  with  moving  listeners  in  order 
to  maintain  real-time  performance. 

3.3.3  Hybrid  Techniques 

Several  methods  for  combining  geometric  and  numerical  acoustic  techniques  have  been  proposed.  One  line 
of  work  is  based  on  frequency  decomposition:  dividing  the  frequencies  to  be  modeled  into  low  and  high 
frequencies.  Low  frequencies  are  modeled  by  numerical  acoustic  techniques,  and  high  frequencies  are  treated 
by  geometric  methods,  including  the  finite  difference  time  domain  method  (FDTD),  the  digital  waveguide 
mesh  method  (DWM),  and  the  finite  element  method  (FEM).  However,  these  methods  use  numerical  methods 
at  lower  frequencies  over  the  entire  domain.  As  a  result,  they  are  limited  to  offline  applications  and  may  not 
scale  to  very  large  scenes. 

Another  method  of  hybridization  is  based  on  spatial  decomposition.  The  entire  simulation  domain  is 
decomposed  to  different  regions:  near-object  regions  are  handled  by  numerical  acoustic  techniques  to  sim- 
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ulate  wave  effects,  while  far-field  regions  are  handled  by  geometric  acoustic  techniques.  Hampel  et  al.  [56] 
combine  the  boundary  element  method  (BEM)  and  geometric  acoustics  using  a  spatial  decomposition.  Their 
method  provides  a  one-way  coupling  from  BEM  to  ray  tracing,  converting  pressures  in  the  near-object  re¬ 
gion  (computed  by  BEM)  to  rays  that  enter  the  far-field  region  containing  the  listener.  In  electromagnetic 
wave  propagation,  Wang  et  al.  [169]  propose  a  hybrid  technique  combining  ray  tracing  and  FDTD.  Their 
technique  is  also  based  on  a  one-way  coupling,  where  rays  are  traced  in  the  far-field  region  and  collected 
at  the  boundaries  of  the  near-object  regions.  The  pressures  are  then  evaluated  and  serve  as  the  boundary 
condition  for  the  FDTD  method.  These  one-way  coupling  methods  do  not  allow  rays  to  enter  and  exit  the 
near-object  regions  of  an  object,  and  therefore  acoustic  effects  of  that  object  will  not  be  propagated  to  the 
far-field  regions.  Barbone  et  al.  [15]  propose  a  two-way  coupling  that  combines  the  acoustic  field  generated 
using  ray-tracing  and  FEM.  Jean  et  al.  [72]  present  a  hybrid  BEM/beam  tracing  approach  to  compute  the 
radiation  of  tyre  noise.  However,  these  methods  do  not  describe  how  multiple  entrance  of  rays  into  near¬ 
object  regions  of  different  objects  is  handled,  which  is  crucial  when  simulating  interaction  between  multiple 
objects. 

3.3.4  Acoustic  Kernel-Based  Interactive  Techniques 

There  has  been  work  in  enabling  interactive  auralization  for  acoustic  simulations  through  precomputation. 
At  a  high  level,  these  techniques  tend  to  precompute  an  acoustic  kernel,  which  is  used  at  runtime  for 
interactive  propagation  in  static  environments.  Raghuvanslii  et  al.  [125]  precompute  acoustic  responses  on  a 
sampled  spatial  grid  using  a  numerical  solver.  They  then  encode  perceptually  salient  information  to  perform 
interactive  sound  rendering.  Mehra  et  al.  [98]  proposed  an  interactive  sound  propagation  technique  for  large 
outdoor  scenes  based  on  equivalent  sources.  Other  techniques  use  geometric  methods  to  precompute  high- 
order  reflections  or  reverberation  [155,  8]  and  compactly  store  the  results  for  interactive  sound  propagation 
at  runtime.  Our  method  can  be  integrated  into  any  of  these  systems  as  an  acoustic  kernel  that  can  efficiently 
capture  wave  effects  in  a  large  scene. 
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Figure  3.2:  Overview  of  spatial  decomposition  in  our  hybrid  sound  propagation  technique:  In  the  precom¬ 
putation  phase,  a  scene  is  classified  into  objects  and  environment  features.  This  includes  near-object  regions 
(shown  in  orange)  and  far-held  regions  (shown  in  blue).  The  sound  held  in  near-object  regions  is  computed 
using  a  numerical  wave  simulation,  while  the  sound  held  in  far-held  region  is  computed  using  geometric 
acoustic  techniques.  A  two-way  coupling  procedure  couples  the  results  computed  by  geometric  and  nu¬ 
merical  methods.  The  sound  pressures  are  computed  at  different  listener  positions  to  generate  the  impulse 
responses.  At  runtime,  the  precomputed  impulse  responses  (IR0-IR3)  are  retrieved  and  interpolated  for  the 
specihc  listener  position  (IRt)  at  interactive  rates,  and  hnal  sound  is  rendered. 


3.4  Overview 


In  this  section  we  give  an  overview  of  sound  propagation  and  our  proposed  approach. 


3.4.1  Sound  Propagation 

For  a  sound  pressure  wave  with  angular  frequency  w,  speed  of  sound  c,  the  problem  of  sound  propagation  in 
domain  Q  in  the  space  can  be  expressed  as  a  boundary  value  problem  for  the  Helmholtz  equation  : 

V2p  +  —^P  =  /;  xe!l,  (3.1) 

where  p(x)  is  the  complex  valued  pressure  held,  V2  is  the  Laplacian  operator,  and  f(x)  is  the  source  term , 
(e.g.  =  0  in  free  space  and  <5(x')  for  a  point  source  located  at  x').  Boundary  conditions  are  specihed 
on  the  boundary  of  the  domain  (which  can  be  the  surface  of  an  solid  object,  the  interface  between 
different  media,  or  an  arbitrarily  dehned  surface)  by  a  Dirichlet  boundary  condition  that  specihes  pressure, 
p(x)  =  0;  x  £  dfl,  a  Neumann  boundary  condition  that  specihes  the  velocity  of  medium,  dpg^  =  0;  x  £  dCl, 
or  a  mixed  boundary  condition  that  specihes  a  complex- valued  constant  Z,  so  that  Z  +p(x)  =  0;  x  £  <911. 
The  pressure  p  at  inhnity  must  also  be  specihed,  usually  by  the  Sommerfeld  radiation  condition  [116], 


65 


liixL| |x| i-s-oo  +  jwcpJ  =  0,  where  ||x||  is  the  distance  of  point  x  from  the  origin  and  j  =  y/—l. 

Different  acoustic  techniques  aim  to  solve  the  above  equations  with  different  formulations.  Numerical 
acoustic  techniques  discretize  Equation  (3.1)  and  solve  for  p  numerically  with  boundary  conditions.  Geo¬ 
metric  acoustic  techniques  model  p  as  a  discrete  set  of  rays  emitted  from  sound  sources  which  interact  with 
the  environment  and  propagate  the  pressure. 

3.4.2  Acoustic  Transfer  Function 

When  modeling  the  acoustic  effects  due  to  objects  or  surfaces  in  a  scene,  it  is  often  useful  to  define  the 
acoustic  transfer  function.  Many  different  acoustic  transfer  functions  have  been  proposed  to  simulate  different 
acoustic  effects.  In  sound  propagation  problems,  the  acoustic  transfer  function  maps  an  incoming  sound 
held  to  an  outgoing  sound  held.  For  example,  Waterman  developed  a  transition-matrix  method  for  acoustic 
scattering  [171]  and  maps  the  incoming  and  outgoing  helds  in  terms  of  the  coefficients  of  a  complete  system 
of  vector  basis  functions.  Antani  et  al.  [8]  compute  an  acoustic  radiance  transfer  operator  that  maps  incident 
sound  to  diffusely  rehected  sound  in  a  scene.  Mehra  et  al.  [98]  model  the  free-held  acoustic  behavior  of  an 
object,  as  well  as  pairwise  interactions  between  objects.  In  sound  radiation  problems,  James  et  al.  [71]  map 
the  vibration  mode  of  an  object  to  the  radiated  sound  pressure  held. 

3.4.3  Hybrid  Sound  Propagation 

We  describe  the  various  components  of  our  hybrid  sound  propagation  technique.  Our  approach  uses  a 
combination  of  frequency  decomposition  and  spatial  decomposition,  as  shown  schematically  in  Figure  3.3. 
Since  frequency  decomposition  is  a  standard  technique  [51],  we  mostly  focus  on  spatial  decomposition  and 
our  novel  two-way  coupling  algorithm  (see  Figure  3.2). 

Frequency  Decomposition:  We  divide  the  modeled  frequencies  to  low  and  high  frequencies,  with  a 
crossover  frequency  i/max.  For  high  frequencies,  geometric  techniques  are  used  throughout  the  entire  domain. 
For  low  frequencies,  a  combination  of  numerical  and  geometric  techniques  is  used  based  on  a  spatial  de¬ 
composition  described  below.  Typical  values  for  ^max  are  0.5-2  kHz,  and  a  simple  low-pass-high-pass  hlter 
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Figure  3.3:  Frequency  and  spatial  decomposition.  High  frequencies  are  simulated  using  geometric  techniques, 
while  low  frequencies  are  simulated  using  a  combination  of  numerical  and  geometric  techniques  based  on  a 
spatial  decomposition. 

combination  is  usually  used  to  join  the  results  at  the  crossover  frequency  region. 

Spatial  decomposition:  Given  a  scene  we  first  classify  it  into  small  objects  and  environment  features. 
The  small  objects,  or  simply  objects,  are  of  size  comparable  to  or  smaller  than  the  wavelength  of  the  sound 
pressure  wave  being  simulated.  The  environment  features  represent  objects  much  larger  than  the  wavelength 
(like  terrain).  The  wavelength  that  is  used  as  the  criterion  for  distinguishing  small  or  large  objects  is  a  user- 
controlled  parameter.  One  possible  choice  is  the  maximum  audible  wavelength  (17  m),  corresponding  to 
the  lowest  audible  frequency  for  human  (20  Hz).  When  sound  interacts  with  objects,  wave  phenomena  are 
prominent  only  when  the  objects  are  small  relative  to  the  wavelength.  Therefore  we  only  need  to  compute 
accurate  wave  propagation  in  the  local  neighborhood  of  small  objects.  We  call  this  neighborhood  the  near¬ 
object  region  (orange  region  in  Figure  3.2)  of  an  object,  and  numerical  acoustic  techniques  are  used  to 
compute  the  sound  pressure  field  in  this  region.  The  region  of  space  away  from  small  objects  is  called  the 
far- field  region  and  is  handled  by  geometric  acoustic  techniques  (blue  region  in  Figure  3.2). 

The  spatial  decomposition  is  performed  as  follows:  For  a  small  object  A,  we  compute  the  offset  surface 
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d A+  and  define  the  near-object  region,  denoted  as  QN ,  as  the  space  inside  the  offset  surface.  The  offset 
surface  of  an  object  is  computed  using  discretized  distance  fields  and  the  marching  cubes  algorithm  similar 
to  James  et  al.  [71].  If  the  offset  surfaces  of  two  objects  intersect  then  they  are  treated  as  a  single  object 
and  are  enclosed  in  one  OjY.  The  space  complementary  to  the  near-object  region  is  defined  as  the  far-field 
region,  and  is  denoted  as  f lG . 

Geometric  acoustics:  The  pressure  waves  constituting  the  sound  field  in  ilG  are  modeled  as  a  discrete  set 
of  rays.  Their  propagation  in  space  and  interaction  with  environment  features  (e.g.  reflection  from  walls) 
are  governed  by  geometric  acoustic  principles.  We  denote  the  pressure  value  defined  collectively  by  the  rays 
at  position  x  as  pG(x), 

PG(X)  =  (3-2) 

reR 

where  pr  is  the  contribution  from  one  ray  r  in  a  set  of  rays  R. 

Numerical  acoustic  techniques:  The  sound  pressure  field  scattered  by  objects  in  QN  is  treated  by  wave- 
based  numerical  techniques  for  lower  frequencies,  in  which  the  wave  phenomena  such  as  diffraction  and 
interference  are  inherently  modeled.  We  denote  the  pressure  value  at  position  x  computed  using  numerical 
techniques  as  pN  (x) . 

Coupling:  At  the  interface  between  near-object  and  far-field  regions,  the  pressures  computed  by  the  two 
different  acoustic  techniques  need  to  be  coupled  (Figure  3.4).  Rays  entering  a  near-object  region  define  the 
incident  pressure  field  that  serves  as  the  input  to  the  numerical  solver.  Similarly,  the  outgoing  scattered 
pressure  field  computed  by  the  numerical  solver  must  be  converted  to  a  set  of  rays.  The  two-way  coupling  are 
modeled  as  transfer  functions  between  incoming  and  outgoing  rays.  The  process  is  detailed  in  Section  3.5. 
Pressure  computation:  At  each  frequency  lower  than  i/max,  the  coupled  geometric  and  numerical  methods 
are  used  to  solve  the  global  sound  pressure  field.  All  frequencies  higher  than  ^max  are  handled  by  geometric 
techniques  throughout  the  entire  domain. 

Acoustic  kernel:  The  previous  stages  serve  as  an  acoustic  kernel ,  which  computes  the  impulse  responses 
(IRs)  for  a  given  source-listener  position  pair.  For  each  sound  source,  the  pressure  value  at  each  listener 
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Figure  3.4:  Two-way  coupling  of  pressure  values  computed  by  geometric  and  numerical  acoustic  techniques, 
(a)  The  rays  are  collected  at  the  boundary  and  the  pressure  evaluated,  (b)  The  pressure  on  the  boundary 
defines  the  incident  pressure  field  p-lnc  in  ClN ,  which  serves  as  the  input  to  the  numerical  solver,  (c)  The 
numerical  solver  computes  the  scattered  field  psca,  which  is  the  effect  of  object  A  to  the  pressure  field,  (d) 
Psca  is  expressed  as  fundamental  solutions  and  represented  as  rays  emitted  to  ttG . 


position  is  evaluated  for  all  simulated  frequencies  to  give  a  complete  acoustic  frequency  response  (FR), 
which  can  in  turn  be  converted  to  an  impulse  response  (IR)  through  Fourier  transform.  IR’s  for  predefined 
source-listener  positions  (usually  on  a  grid)  are  precomputed  and  stored. 

Auralization:  At  runtime,  the  IR  for  a  general  listener  position  is  obtained  by  interpolating  the  neighboring 
precomputed  IR’s  [125],  and  the  output  sound  is  auralized  by  convoluting  the  input  sound  with  the  IRs  in 
real  time. 


3.5  Two-Way  Wave-Ray  Coupling 

In  this  section,  we  present  the  details  of  our  two-way  coupling  procedure.  We  also  highlight  the  precomputa¬ 
tion  and  runtime  phases.  The  coupling  procedure  ensures  the  consistency  between  pG  and  pN ,  the  pressures 
computed  by  the  geometric  and  numerical  acoustic  techniques,  respectively.  Any  exchange  of  information 
at  the  interface  between  fiG  and  QN  must  result  in  valid  solutions  to  the  Helmholtz  equation  (3.1)  in  both 
domains  and  f 1N . 

3.5.1  Geometric  — >  Numerical 

From  the  pressure  field  pG ,  we  want  to  find  the  incident  pressure  field  p-lnc,  which  serves  as  the  input  to  the 
numerical  solver  inside  ilN .  The  incident  pressure  field  is  defined  as  the  pressure  field  that  corresponds  to 
the  solution  of  the  wave  equation  if  there  were  no  objects  in  flN . 

Mathematically  p-lnc  is  the  solution  of  the  free-space  Helmholtz  Equation  (3.1)  with  forcing  term  /  =  0. 
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Since  there  is  no  object  in  domain  fiG, 

Pinc(x)  =pG(x);  x  G  flG .  (3.3) 

This  equation  defines  a  Dirichlet  boundary  condition  on  the  interface  dA+\ 

p  =  pG(x);  xGck4+,  (3.4) 

The  uniqueness  of  the  acoustic  boundary  value  problem  guarantees  that  the  solution  of  the  free-space 
Helmholtz  Equation,  along  with  the  specified  boundary  condition,  is  unique  inside  ttN .  The  unique  solution 
Pinc(x)  can  be  found  by  expressing  it  as  a  linear  combination  of  fundamental  solutions.  1  If  <pi(x)  is  a 
fundamental  solution,  and  p;nc(x)  is  expressed  as  a  linear  combination, 

Pine (x)  =  ^Ci<Pi(x)  X  G  flN,  (3.5) 

i 

then  the  linearity  of  the  wave  equation  implies  that  pinc(x)  is  also  a  solution.  Furthermore,  if  the  coefficients 
q  are  such  that  the  boundary  condition  (3.4)  is  satisfied,  then  p;nc(x)  is  the  required  unique  solution  to 
the  boundary  value  problem  (Section  3  in  Ochmann  [104]).  Therefore,  the  resultant  pressure  field  is  a  valid 
incoming  field  in  the  numerical  domain.  The  numerical  solver  takes  the  incident  pressure  field,  considers  the 
effect  of  the  object  inside  fiw,  and  computes  the  outgoing  scattered  field.  Figures  3.4(a)  and  3.4(b)  illustrate 
the  process. 

3.5.2  Numerical  — »  Geometric 

In  order  to  transfer  information  from  QN  to  fiG,  a  discrete  set  of  rays  must  be  determined  to  represent 
the  computed  pressure  pN .  These  outgoing  rays  may  be  emitted  from  some  starting  points  located  in  flN 
and  carry  different  information  related  to  the  modeled  pressure  waves  (strength,  phase,  frequency,  spatial 

1  A  fundamental  solution  F  for  a  linear  operator  L  (in  this  case  the  Helmholtz  operator  L  =  V2  +  )  is  defined  as  the 

solution  to  the  equation  LF  =  <5(x),  where  S  is  the  Dirac  delta  function  [165]. 
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derivatives  of  pressure,  etc.)  The  coupling  procedure  thus  needs  to  compute  the  appropriate  outgoing  rays, 
given  the  numerically  computed  pN . 

The  scattered  field  in  the  numerical  domain  due  to  the  object  can  be  simply  written  as, 

Psca(x)  =  pJV(x);  x£flN.  (3.6) 

We  need  to  find  the  scattered  field  outside  of  f 1N ,  and  model  it  as  a  set  of  rays.  As  before,  Equation  (3.6) 
defines  a  Dirichlet  boundary  condition  on  the  interface  (M+, 

p  =  pN(x)-  xe<9  A+.  (3.7) 

The  free  space  Helmholtz  Equation,  along  with  this  boundary  condition,  uniquely  defines  the  scattered  field 
Psca  outside  QN .  We  again  express  psca  as  a  linear  combination  of  fundamental  solutions  Pj : 

Psca(x)  =  y xgHg,  (3.8) 

3 

and  then  find  the  coefficients  Cj  by  satisfying  the  boundary  condition  (3.7).  This  gives  us  a  unique  solution 
for  scattered  field  psca(x)  outside  QN.  We  then  use  a  set  of  rays  R°ut  to  model  the  fundamental  solutions 
<Pj(x)  such  that 

i pj(x)=  Pr(x )>  xgHg.  (3.9) 

r£R°ut 

These  rays  correctly  represent  the  outgoing  scattered  field  in  f lG.  Figure  3.4(c)  and  3.4(d)  illustrate  the 
process. 

The  coupling  process  described  above  is  a  general  formulation  and  is  independent  of  the  underlying 
numerical  solver  (BEM,  FEM,  etc.)  that  is  used  to  compute  pN  as  long  as  the  pressure  on  the  interface 
dA+  can  be  evaluated  and  expressed  as  a  set  of  fundamental  solutions.  Depending  on  the  mathematical 
formulation  of  the  selected  set  of  fundamental  solutions  ipj(x),  different  rays  (starting  points,  directions, 
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information  carried,  etc.)  can  be  defined.  However,  a  general  principle  is  that  if  <Pj(x)  has  a  singularity  at 
yj .  then  y:l  is  a  natural  starting  point  from  which  rays  are  emitted.  The  directions  of  rays  sample  a  unit 
sphere  uniformly  or  with  some  distribution  function  (e.g.  guided  sampling  [150]).  The  choice  of  fundamental 
solutions  will  be  discussed  in  the  next  section. 

Note  that  if  the  fundamental  solutions  <pi  and  ipj  used  to  express  the  incident  field  (Equation  (3.5)) 
and  outgoing  scattered  field  (Equation  (3.8))  are  predetermined,  then  the  mapping  from  ip.i  to  ipj  can  be 
precomputed.  This  precomputation  process  will  be  discussed  in  section  3.5.4. 

3.5.3  Fundamental  solutions 

The  requirement  for  the  choice  of  fundamental  solution  ipj  is  that  it  must  satisfy  the  Helmholtz  Equation  (3.1) 
and  the  Sommerfeld  radiation  condition. 

Equivalent  Sources :  One  choice  of  fundamental  solutions  is  based  on  equivalent  sources  [104].  Each  funda¬ 
mental  solution  is  chosen  to  correspond  to  the  field  due  to  multipole  sources  of  order  L  (L  =  1  is  a  nronopole, 
L  =  2  is  a  dipole,  etc.)  located  at  y3: 

(Pj(x)  =  (3.10) 

for  l  <  L  —  1  and  —  l  <  m  <  l,  and 

( 2) 

'•Pjlm  ~  T lmh[  P j / c)lplm(d j  ,  (f>  j) ,  (3.11) 

where  ( Pj,0j ,  <pj)  corresponds  to  the  vector  (x— y^)  expressed  in  spherical  coordinates,  /i;2  (•)  is  the  complex¬ 
valued  spherical  Hankel  function  of  the  second  kind,  i pim(9j,(pj)  is  the  complex- valued  spherical  harmonic 
function,  and  T;m  is  the  real- valued  normalizing  factor  that  makes  the  spherical  harmonics  orthonormal  [10]. 
We  use  a  shorthand  generalized  index  h  for  ( l,m ),  such  that  ipjhpx-)  =  tfijimi x). 

For  pressure  fields  outside  of  dA+  (i.e.  in  flG),  these  equivalent  sources  are  placed  inside  of  dA+  (i.e.  in 
flN).  In  a  similar  fashion,  for  pressure  fields  inside  f lN ,  the  equivalent  sources  must  be  placed  outside  ClN . 
We  model  the  outgoing  pressure  field  from  these  equivalent  sources  using  rays  (Equation  (3.9))  as  follows. 
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Rays  are  emitted  from  the  source  location  y7 .  For  a  ray  of  direction  ( 6 ,  <f>)  that  has  traveled  a  distance  p,  its 
pressure  is  scaled  by  ipimiO , (j>)  and  h\  (ujp/c). 

Note  that  we  can  use  equivalent  sources  to  express  a  pressure  field  independently  of  how  the  pressure 
field  was  computed.  For  a  computed  pN ,  we  only  need  to  find  the  locations  y7  and  coefficients  Cj  of  the 
equivalent  sources.  This  is  performed  by  satisfying  the  boundary  condition  (3.8)  in  a  least  squared  sense. 
Boundary  Elements:  If  the  underlying  numerical  acoustic  technique  of  choice  is  the  boundary  element  method 
(BEM),  then  another  set  of  fundamental  solutions  which  is  directly  based  on  the  BEM  formulation  is  possible. 
For  a  domain  with  boundary  dfl,  the  boundary  element  method  solves  the  boundary  integral  equation  of 
the  Helmholtz  equation.  The  boundary  dQ  is  discretized  into  triangular  surface  elements,  and  the  equation 
is  solved  numerically  for  two  variables;  the  pressure  p  and  its  normal  derivative  ^  on  the  boundary.  Once 
the  boundary  solutions  p  and  ^  are  known,  the  sound  pressure  in  the  domain  can  be  found  for  any  point 
x  by  summing  all  the  contributions  from  the  surface  triangles: 

p<x)  =  L  (G(y’x,^r  "  ^sr^(y))  «*><y»-  (3-12) 

where  y  is  the  approximated  position  of  the  triangle  and  G  is  the  Green’s  Function  G(y,x)  =  exp(jw|x  — 
y|/c)/47r|x- y|  [55]. 

Note  that  the  discretization  of  Equation  (3.12)  also  takes  the  form  of  Equation  (3.8)  as  a  linear  combi¬ 
nation  of  fundamental  solutions: 

p(x)  =  (CM(X)  +  CMM)  >  (3-!3) 

3 

where  the  two  kinds  of  fundamental  solutions  are 


d(x) =  G(ybx) 


dn 


;  <^(x)  =  - 


9G{  yj, 
dn 


-p(  yj)- 


(3.14) 


Under  this  formulation,  we  can  represent  the  pressure  field  as  two  kinds  of  rays  emitted  from  each  triangle 
location  yj ,  each  modeling  92 j  (x)  and  (x)  respectively.  Then  for  a  point  in  the  pressure  field  defined 
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by  the  rays  is  computed  according  to  Equation  (3.12). 


3.5.4  Precomputed  Transfer  Functions 

If  we  consider  what  happens  in  QN  as  a  black  box,  the  net  result  of  the  coupling  and  the  numerical  solver 
is  that  a  set  of  rays  enter  QN  and  then  another  set  of  rays  exit  Ow: 

Rin  R out^  (3.1.5) 

where  Rm  is  the  set  of  incoming  rays  entering  Q'v,  Rout  is  the  set  of  outgoing  rays,  and  At  is  the  ray  transfer 
function.  In  this  case,  the  function  At  is  similar  to  the  bidirectional  reflectance  distribution  function  (BRDF) 
for  light  [17].  In  our  formulation,  At  encodes  all  the  operations  for  the  following  computations: 

1.  Collect  pressures  defined  by  R'n  to  form  the  incident  field  on  the  interface  (Equation  (3.4)); 

2.  Express  the  incident  field  as  a  set  of  fundamental  solutions  (Equation  (3.5)); 

3.  Compute  the  outgoing  scattered  field  using  the  numerical  acoustic  technique; 

4.  Express  the  outgoing  scattered  field  as  a  set  of  fundamental  solutions  (Equation  (3.8);  and  finally, 

5.  Find  a  set  of  rays  Rout  that  model  these  functions  (Equation  (3.9). 

A  straightforward  realization  of  hybrid  sound  propagation  technique  is  possible:  from  each  sound  source 
rays  are  traced,  interacting  with  the  environment  features,  entering  and  exiting  the  near-object  regions 
transfered  by  different  A4’s,  and  finally  reaching  a  listener.  However,  as  the  first  step  of  A4  depends  on  the 
incoming  rays  Rln,  a  different  A4  must  be  computed  each  time  the  rays  enter  the  same  near-object  region. 
Moreover,  the  process  must  be  repeated  until  the  solution  converges  to  a  steady  state,  which  may  be  too 
time-consuming  for  a  scene  (e.g.  an  indoor  scene)  with  multiple  ray  reflections  causing  multiple  entrances 
to  near-object  regions. 

While  previous  two-way  hybrid  techniques  do  not  consider  this  problem  [15,  72],  we  address  this  problem 
by  observing  that  if  the  fundamental  solutions  in  Step  2  (denoted  as  tpf1)  and  Step  4  (denoted  as  <p°ut)  are 
predefined,  then  we  can  precompute  the  results  of  Step  2-Step  5  for  an  object.  Similar  to  the  BRDF  for  light, 
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one  can  define  the  BRDF  for  sound.  The  mapping  of  <pjn  to  <p°ut  for  an  object  is  called  the  per-object  transfer 
function.  For  different  Rln  that  define  an  incident  field  p-lnc  on  the  interface,  we  only  need  to  compute  the 
expansion  coefficients  di  of  the  fundamental  solutions  tpf 1;  the  outgoing  rays  are  computed  by  applying  the 
precomputed  per-object  transfer  function. 

The  outgoing  scattered  field  that  is  modeled  as  outgoing  rays  from  an  object  A  may,  after  propagating 
in  space  and  interacting  with  the  environment,  enter  as  incoming  rays  into  the  near-object  region  of  another 
object  B.  For  a  scene  where  the  environment  and  relative  positions  of  various  objects  are  fixed,  we  can 
precompute  all  the  propagation  paths  for  rays  that  correspond  to  T’s  outgoing  basis  functions  and  that 
reach  B’s  near-object  region.  These  rays  determine  the  incident  pressure  field  arriving  at  object  B ,  which 
can  again  be  expressed  as  a  linear  combination  of  a  set  of  basis  functions  <^nB.  The  mapping  from  ip°u^ 
to  <f?B,  called  the  inter-object  transfer  function ,  which  is  a  fixed  function  and  can  also  be  precomputed. 
Interactions  between  multiple  objects  can  therefore  be  found  by  a  series  of  applications  of  the  inter-object 
transfer  functions. 

Based  on  the  per-object  and  inter-object  transfer  functions,  all  orders  of  acoustic  interaction  (correspond¬ 
ing  to  multiple  entrance  of  rays  to  near-object  regions)  in  the  scene  can  be  found  for  the  total  sound  field 
by  solving  a  global  linear  system,  which  is  much  faster  than  the  straightforward  hybridization,  where  the 
underlying  numerical  solver  is  invoked  multiple  times  for  each  order  of  interactions.  The  trade-off  is  that  the 
transfer  functions  have  to  be  precomputed.  However,  the  pre-object  transfer  functions  can  be  reused  even 
when  the  objects  are  moved.  This  characteristic  is  beneficial  for  quick  iterations  when  authoring  scenes, 
and  can  potentially  be  a  cornerstone  for  developing  sound  propagation  systems  that  supports  fully  dynamic 
scenes. 


3.6  Implementation 

In  this  section  we  discuss  the  implementation  aspect  for  our  technique. 
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Hybrid  Pressure  Solving  Pressure 


Numerical  Geometric  Evaluation 


Scene 

#src 

#freq. 

#eq.  srcs 

wave  sim. 

per-object 

inter-object 

source  + 
global  field 

#tris 

order 

#rays 

propagation 

time 

Building+small 

1 

300 

220  K 

163  min 

552  min 

22  min 

19  min 

60 

3 

4096 

41  min 

81  sec 

Building+medium 

1 

400 

290  K 

217  min 

736  min 

33  min 

23  min 

60 

3 

4096 

39  min 

81  sec 

Building+large 

1 

800 

580  K 

435  min 

1472  min 

54  min 

40  min 

60 

3 

4096 

39  min 

81  sec 

Reservoir 

1 

500 

500  K 

254  min 

252  min 

4  min 

2.6  min 

16505 

2 

262144 

1.9  min 

10  sec 

Parking 

2 

500 

123  K 

55  min 

40  min 

3  min 

0.9  min 

5786 

2 

4096 

6.6  min 

24  sec 

Table  3.1:  Precomputation  Performance  Statistics.  The  rows  “Building+small” ,  “Build- 

ing+medium” ,  and  “Building+large”  correspond  to  scenes  with  a  building  surrounded  by  small,  medium, 
and  large  walls,  respectively.  “Reservoir”  and  “Parking”  denote  the  reservoir  and  underground  parking 
garage  scene  respectively.  For  a  scene,  “#src”  denotes  the  number  of  sound  sources  in  the  scene,  “#freq.” 
is  the  number  of  frequency  samples,  and  “#eq.  srcs”  denotes  the  number  of  equivalent  sources.  The  first 
part,  “Hybrid  Pressure  Solving” .  includes  all  the  steps  required  to  compute  the  final  equivalent  source 
strengths,  and  is  performed  once  for  a  given  sound  source  and  scene  geometries.  The  second  part,  “Pressure 
Evaluation” ,  corresponds  to  the  cost  of  evaluating  the  contributions  from  all  equivalent  sources  at  a  listener 
position  and  is  performed  once  for  each  listener  position.  For  the  numerical  technique,  “wave  sim.”  refers  the 
total  simulation  time  of  the  numerical  wave  solver  for  all  frequencies;  “per-object”  denotes  the  computation 
time  of  for  per-object  transfer  functions;  “inter-object”  is  the  inter-object  transfer  functions  for  each  pair  of 
objects  (including  self-inter-object  transfer  functions,  where  the  pressure  wave  leaves  a  near-object  region 
and  reflected  back  to  the  same  object);  “source  +  global  solve”  is  the  time  to  solve  the  linear  system  to 
determine  the  strengths  of  incoming  and  outgoing  equivalent  sources.  For  the  geometric  technique,  tris” 
is  the  number  of  triangles  in  the  scene;  “order”  denotes  the  order  of  reflections  modeled;  “+  rays”  is  the 
number  of  rays  emitted  from  a  source  (sound  source  or  equivalent  source).  The  column  “propagation  time” 
includes  the  time  of  finding  valid  propagation  paths  and  computing  pressures  for  any  intermediate  step  (e.g. 
from  one  object  to  another  object’s  offset  surface). 
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3.6.1  Implementation  details 


The  geometric  acoustics  code  is  written  in  C++,  based  on  the  Impulsonic  sound  propagation  SDK,  which 
implements  a  ray-tracing  based  image  source  method.  For  the  numerical  acoustic  technique  we  use  a  GPU- 
based  implementation  of  the  ARD  wave-solver  [124],  Per-object  transfer  functions,  inter-object  transfer 
functions,  and  equivalent  source  strengths  are  computed  using  a  MATLAB  implementation  based  on  [98]. 

Table  3.1  provides  the  detailed  timing  results  for  the  precomputation  stage.  The  timings  are  divided 
into  two  groups.  The  first  group,  labeled  as  “Hybrid  Pressure  Solving,”  consists  of  all  the  steps  required  to 
compute  the  final  equivalent  source  strengths.  These  computations  are  performed  once  for  a  given  scene. 
The  second  group,  labeled  as  “Pressure  Evaluation,”  involves  the  computation  of  the  pressures  contributed 
by  all  equivalent  sources  at  a  listener  position.  This  computation  is  performed  once  for  each  sampled  listener 
position. 

The  timing  results  for  “wave  sim.”  (simulation  time  of  the  ARD  wave  solver),  and  “Pressure  Evaluation” 
are  measured  on  a  single  core  of  a  4-core  2.80  GHz  Xeon  X5560  desktop  with  4GB  of  RAM  and  NVIDIA 
GeForce  GTX  480  GPU  with  1.5  GB  of  RAM.  All  the  other  results  are  measured  on  a  cluster  containing 
a  total  of  436  cores,  with  sixteen  16-CPUs  (8  dual-core  2.8GHz  Opterons,  32GB  RAM  each)  and  forty-five 
4-CPU  (2  dual-core  2.6GHz  Opterons,  8GB  RAM  each). 

We  assume  the  scene  is  given  as  a  collection  of  objects  and  terrains.  In  the  spatial  decomposition  step, 
the  offset  surface  is  computed  using  distance  fields.  One  important  parameter  is  the  spatial  Nyquist  distance 
h,  corresponding  to  the  highest  frequency  simulated  i/max,  h  =  c/2i/max,  where  c  is  the  speed  of  sound. 
To  ensure  enough  spatial  sampling  on  the  offset  surface,  we  choose  the  voxel  resolution  of  distance  held  to 
be  h,  and  the  sample  points  are  the  vertices  of  the  surface  given  by  the  marching  cubes  algorithm.  The 
offset  distance  is  chosen  to  be  8 h.  In  general,  a  larger  offset  distance  means  a  larger  spatial  domain  for 
the  numerical  solver  and  is  therefore  more  expensive.  On  the  other  hand,  a  larger  offset  distance  results 
in  a  pressure  held  with  less  detail  (i.e.  reduced  spatial  variation)  on  the  offset  surface,  and  fewer  outgoing 
equivalent  sources  are  required  to  achieve  the  same  error  threshold. 
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3.6.2  Collocated  equivalent  sources 


The  positions  of  outgoing  equivalent  sources  can  be  generated  by  a  greedy  algorithm  that  selects  the  best 
candidate  positions  randomly  [71].  However,  if  each  frequency  is  considered  independently,  a  total  of  1 M 
or  outgoing  equivalent  sources  may  arise  across  all  simulated  frequencies.  Because  we  must  trace  Nr  rays, 
(typically  thousands  or  more)  from  each  equivalent  source,  this  computation  becomes  a  major  bottleneck  in 
our  hybrid  framework.  This  may  cause  a  computation  bottleneck  in  our  hybrid  framework,  because  we  need 
to  trace  Nr  rays  (typically  thousands  or  more)  from  each  equivalent  source. 

We  resolve  this  issue  by  reusing  equivalent  sources  positions  across  different  frequencies  as  much  as 
possible.  First,  the  equivalent  sources  for  the  highest  frequency  v maxi  which  requires  the  highest  number 
of  equivalent  sources,  Pmax ,  are  computed  using  the  greedy  algorithm.  For  lower  frequencies,  the  candidate 
positions  are  drawn  from  the  Pmax  existing  positions,  which  guarantees  that  a  total  of  Pmax  collocated 
positions  is  occupied.  Indeed,  when  the  path  is  frequency-independent,  rays  emitted  from  collocated  sources 
will  travel  the  same  path,  which  reduces  the  overall  ray-tracing  cost.  The  frequency- independent  path 
assumption  holds  for  paths  containing  only  specular  reflections,  in  which  case  the  incident  and  reflected 
directions  are  determined.  We  observe  a  60  —  100A  speedup  while  maintaining  the  same  error  bounds  over 
methods  without  the  collocation  scheme.  All  the  timings  results  in  this  section  are  based  on  this  optimization. 

3.6.3  Auralization 

We  compute  the  frequency  responses  using  our  spatial  decomposition  approach  up  to  ^max  =  1  kHz  with 
a  sampling  step  size  of  2.04  Hz.  For  frequencies  higher  than  we  use  a  ray  tracing  solution,  with 

diffractions  approximated  by  the  Uniform  Theory  of  Diffraction  (UTD)  model  [82].  We  join  the  low-  and 
high-frequency  responses  in  the  region  [800, 1000]  Hz  using  a  low-pass-higlr-pass  filter  combination. 

The  sound  sources  in  our  system  are  recorded  audio  clips.  The  auralization  is  performed  using  overlap- 
add  STFT  convolutions.  A  ’’dry”  input  audio  clip  is  first  segmented  into  overlapping  frames,  and  a  windowed 
(Blackman  window)  Short-Time  Fourier  transform  (STFT)  is  performed.  The  transformed  frames  are  mul¬ 
tiplied  by  the  frequency  responses  corresponding  to  the  current  listener  position.  The  resulting  frequency- 
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Scene 

#IR  samples 

Memory 

Time 

Building+small 

960 

19  MB 

3.5  ms 

Building+med 

1600 

32  MB 

3.5  ms 

Building+large 

6400 

128  MB 

3.5  ms 

Reservoir 

17600 

352  MB 

1.8  ms 

Table  3.2:  Runtime  Performance  on  a  Single  Core.  For  each  scene,  “#IR  samples”  denotes  the 
number  of  IR’s  sampled  in  the  scene  to  support  moving  listeners  or  sources;  “Memory”  shows  the  memory 
to  store  the  IR’s;  “Time”  is  the  total  running  time  needed  to  process  and  render  each  audio  buffer. 


domain  frames  are  then  transformed  back  to  time-domain  frames  using  inverse  FFT,  and  the  final  audio 
is  obtained  by  overlap-adding  the  frames.  For  spatialization  we  use  a  simplified  spherical  head  model  with 
one  listener  position  for  each  ear.  Richer  spatialization  can  be  modeled  using  head  related  transfer  functions 
(HRTFs),  which  are  easily  integrated  in  our  approach. 

For  the  interactive  auralization  we  implemented  a  simplified  version  of  the  system  proposed  by  Raghu- 
vanshi  et  al.  [125].  Only  the  listener  positions  are  sampled  on  a  grid  (of  0.5m-lm  grid  size),  and  the  sound 
sources  are  kept  static.  The  case  of  moving  sound  sources  and  a  static  listener  is  handled  using  the  principle 
of  acoustic  reciprocity  [116].  The  interactive  auralization  is  demonstrated  through  integration  with  Valve’s 
Source™game  engine.  Audio  processing  is  performed  using  FMOD  at  a  sampling  rate  of  44.1  kHz;  the  au¬ 
dio  buffer  length  is  4096  samples,  and  the  FFTs  are  computed  using  the  Intel  MKL  library.  The  runtime 
performance  statistics  are  summarized  in  Table  3.2.  The  parking  garage  scene  is  rendered  off-line  and  not 
included  in  this  table. 


3.7  Results  and  Analysis 


In  this  section  we  present  the  results  of  our  hybrid  technique  in  different  scenarios  and  error  analysis. 


3.7.1  Scenarios 

We  demonstrate  the  effectiveness  of  our  technique  in  a  variety  of  scenes.  The  scenes  are  at  least  as  complex 
as  those  shown  in  previous  wave-based  sound  simulation  techniques  [71,  124,  98]  or  geometric  methods  with 
precomputed  lrigh-order  reverberation  [155,  8].  Please  refer  to  the  supplementary  video  for  the  auralizations. 
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In  each  scene,  we  compare  the  audio  generated  by  our  method  with  existing  sound  propagation  methods: 
a  pure  geometric  technique  is  used  for  comparison  [150],  which  models  specular  reflection  as  well  as  edge 
diffraction  through  UTD;  a  pure  numerical  technique,  the  ARD  wave-solver  [124].  Comparisons  with  ARD 
are  done  only  in  a  limited  selection  of  scenes  (Building) ,  while  the  other  scenes  (Underground  Parking  Garage 
and  Reservoir)  are  too  large  to  fit  in  the  memory  using  ARD. 

Building.  As  the  listener  walks  behind  the  building,  we  observe  the  low-pass  occlusion  effect  with  smooth 
transition  as  a  result  of  diffraction.  We  also  observe  the  reflection  effects  due  to  the  surrounding  walls.  We 
show  how  sound  changes  as  the  distance  from  the  listener  to  the  walls  and  the  height  of  the  walls  vary. 
Underground  Parking  Garage.  This  is  a  large  indoor  scene  with  two  sound  sources,  a  human  and  a 
car,  as  well  as  vehicles  that  scatter  and  diffract  sound.  As  the  listener  walks  through  the  scene,  we  observe 
the  characteristic  reverberation  of  a  parking  garage,  as  well  as  the  variation  of  sound  received  from  various 
sources  depending  on  whether  the  listener  is  in  the  line-of-sight  of  the  sources. 

Reservoir.  We  demonstrate  our  system  in  a  large  outdoor  scene  from  the  game  Half-Life  2,  with  a 
helicopter  as  the  sound  source.  This  scene  shows  diffraction  and  scattering  due  to  a  rock;  it  also  shows  high- 
orcler  interactions  between  the  scattered  pressure  and  the  surrounding  terrain,  which  is  most  pronounced 
when  the  user  walks  through  a  passage  between  the  rock  and  the  terrain.  Interactive  auralization  is  achieved 
by  precomputing  the  IRs  at  a  grid  of  predefined  listener  positions.  We  also  make  the  helicopter  fly  and 
thereby  demonstrate  the  ability  to  handle  moving  sound  sources  and  high-order  diffractions. 

3.7.2  Error  Analysis 

In  Figure  3.5  we  compare  the  results  of  our  hybrid  technique  with  BEM  on  a  spatial  grid  of  listener  locations 
at  different  frequencies  for  several  scenes:  two  parallel  walls,  two  walls  with  a  ground,  an  empty  room,  and 
two  walls  in  a  room.  BEM  is  one  of  the  most  accurate  wave-based  simulators  available,  and  comparing 
with  high-accuracy  simulated  data  is  a  widely  adopted  practice  [15,  72,  56].  BEM  results  are  generated  by 
the  FastBEM  simulator.  A  comparison  with  a  geometric  technique  for  the  last  scene  is  also  provided.  The 
geometric  technique  models  8  orders  of  reflection  and  2  orders  of  diffraction  through  UTD. 
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We  also  compute  the  difference  in  pressure  field  (i.e.  the  error)  between  our  hybrid  technique  with 
varying  reflection  orders  and  BEM,  as  shown  in  Figure  3.6  for  the  “Two  Walls  in  a  Room”  scene.  The  error 
between  the  pressure  fields  generated  by  the  reference  wave  solver  and  by  our  hybrid  method  ,  is  computed 
as  ||-Pref  —  Pref||2/||Pref||i  where  Pref  and  Pref  are  vectors  consisting  of  complex  pressure  values  at  all  the 
listener  positions  and  1 1  •  1 1  denotes  the  two-norm  of  complex  values,  summed  over  all  positions  x  (the  grid 
of  listeners  as  shown  in  Figure  3.5).  Higher  reflection  orders  lead  to  more  accurate  results  but  require  more 
rays  to  be  traced. 

3.7.3  Complexity 

Consider  a  scene  with  k  objects.  We  perform  the  complexity  analysis  for  frequency  v  and  discuss  the  cost 
of  numerical  and  geometric  techniques  used. 

Numerical  Simulation  and  Pre-Processing:  The  pre-processing  involves  several  steps:  (1)  performing 
the  wave  simulation  using  numerical  techniques,  (2)  computing  per-object  and  inter-object  transform  matrix, 
and  (3)  solving  linear  systems  to  determine  the  strengths  of  incoming  and  outgoing  equivalent  sources  [98]. 
In  our  system,  the  equivalent  sources  are  limited  to  monopoles  and  dipoles,  and  the  complexity  follows: 

0{nnQP 2  +  K2nPQ 2  +  K(u\ogu)  +  k3P3),  (3.16) 

where  Q ,  P  are  the  number  of  incoming  and  outgoing  equivalent  sources  respectively,  n  is  the  number  of 
offset  surface  samples,  and  u  is  the  volume  of  an  object.  The  number  of  equivalent  sources  P  and  Q  scale 
quadratically  with  frequency. 

Ray  Tracing:  Assume  the  scene  has  T  triangles,  and  from  each  source  we  trace  Nr  rays  to  the  scene.  The 
cost  for  one  bounce  of  tracing  from  a  source  is  0(Nr\ogT)  on  average  and  0{NrT )  in  the  worst  case.  If 
the  order  of  reflections  modeled  is  d,  then  the  (worst  case)  cost  of  ray-tracing  is  0(NrTd).  This  cost  is 
multiplied  by  the  number  of  sources  (sound  sources  and  equivalent  sources)  and  the  number  of  points  where 
the  pressure  values  need  to  be  evaluated.  The  total  cost  is  dominated  by  computing  inter-object  transfer 
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functions,  where  the  pressure  from  P  outgoing  equivalent  sources  from  an  object  needs  to  be  evaluated  at  n 
sample  positions  on  the  offset  surface  of  another  object.  This  results  in 

0{n2PnTd)  (3.17) 


for  a  total  of  k2  pairs  of  objects  in  the  scene. 

In  our  collocated  equivalent  source  scheme,  however,  the  P  outgoing  sources  for  different  frequencies 
share  a  total  of  Pco i  positions.  The  rays  traced  from  a  shared  position  can  be  reused,  so  for  all  frequencies 
v,  we  only  need  to  trace  rays  from  Pcoi  positions  instead  of  P(y)  positions  . 

The  choice  of  Nr  is  scene-dependent.  In  theory,  in  order  to  discover  all  possible  reflections  from  all  scene 
triangles  without  missing  a  propagation  path,  the  ray  density  along  every  direction  should  be  high  enough 
so  that  the  triangle  spanning  the  smallest  solid  angle  viewed  from  the  source  can  be  hit  by  at  least  one  ray. 
The  problem  of  missing  propagation  paths  is  intrinsic  to  all  ray-tracing  methods.  It  can  be  overcome  by 
using  beam-tracing  methods  [45] ,  but  they  are  considerably  more  expensive  and  are  only  practical  for  simple 
scenes. 

The  order  of  reflection  d  also  depends  on  the  scene  configuration.  For  an  outdoor  scene  where  most 
reflections  come  from  the  ground,  a  few  reflections  are  sufficient.  In  enclosed  or  semi-enclosed  spaces  more 
reflections  are  needed.  In  practice  it  is  common  to  stop  tracing  rays  when  a  given  bound  of  reflection  is 
reached,  or  when  the  reflected  energy  is  less  than  a  threshold. 

Scalability  Although  the  computation  domain  of  the  numerical  solver,  ClNt  is  smaller  than  the  entire 
scene,  the  size  of  the  entire  scene  still  matters.  Larger  scenes  require  longer  IR  responses  and  therefore 
more  frequency  samples,  which  affect  the  cost  of  both  numerical  and  geometric  components  of  our  hybrid 
approach.  Larger  scenes  in  general  require  more  triangles,  assuming  the  terrain  has  the  same  feature  density. 
For  a  scene  whose  longest  dimension  is  L ,  the  number  of  IR  samples  (and  therefore  frequency  samples)  scales 
as  O(L),  and  the  number  of  triangles  scales  as  0(L2),  -  giving  overall  numerical  and  ray-tracing  complexities 
of  -  0{L)  and  0(LA  logL)  respectively.  This  is  better  than  most  numerical  methods;  for  example,  the  time 
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complexity  of  ARD  are  0(LA  log  L)  and  FDTD  scale  0(A4). 

We  tested  the  scalability  of  our  method  with  the  building  scene  by  increasing  the  size  of  the  scene 
and  measuring  the  performance.  The  results  are  shown  in  Figure  3.7.  Since  the  open  space  is  handled  by 
geometric  methods,  whose  complexity  of  the  geometric  method  is  not  a  direct  function  of  the  total  volume,  it 
is  not  necessary  to  divide  the  open  space  into  several  connected  smaller  domains,  as  some  previous  methods 
did  [124], 

3.7.4  Comparison  with  Prior  Techniques 

Compared  with  geometric  techniques,  our  approach  is  able  to  capture  wave  effects  such  as  scattering  and 
high-order  diffraction,  thereby  generate  sound  of  higher  quality.  When  compared  with  performing  numerical 
wave-based  techniques  such  as  ARD  and  BEM,  over  the  entire  domain,  our  approach  is  much  faster  as 
we  use  a  numerical  solver  only  in  near-object  regions,  as  opposed  to  the  entire  volume.  We  do  not  have 
a  parallel  BEM  implementation,  but  extrapolating  from  the  data  in  Figure  6,  FastBEM  would  take  100+ 
hours  for  Underground  Parking  Garage  and  1000+  hours  for  Reservoir  on  a  500-core  cluster  to  simulate 
sound  up  to  1  kHz,  assuming  full  parallelization.  In  comparison,  our  method  can  perform  all  (numeric, 
geometric,  and  coupling)  precomputations  in  a  few  hours  for  these  two  scenes  (as  shown  in  Table  3.1)  to 
achieve  interactive  runtime  performance  (see  Table  3.2).  Moreover,  numerical  techniques  typically  require 
memory  proportional  to  the  third  or  fourth  power  of  frequency  to  evaluate  pressures  and  compute  I’s  at 
different  listener  positions.  As  shown  in  Table  3.3,  our  method  requires  orders  of  magnitude  less  memory 
than  several  standard  numerical  techniques.  We  have  also  highlighted  the  relative  benefits  of  our  two-way 
coupling  algorithms  with  other  hybrid  methods  used  in  acoustic  and  electromagnetic  simulation  (see  Section 
2.3).  In  many  ways,  our  coupling  algorithm  ensures  continuity  and  consistency  of  the  field  computed  by 
numeric  and  geometric  techniques  at  the  artificial  boundary  between  their  computational  domains. 

The  method  proposed  by  Melira  et  al.  [98]  is  also  able  to  simulate  the  acoustic  effects  of  objects  in  large 
outdoor  scenes.  Their  formulation,  however,  only  allows  objects  to  be  situated  in  an  empty  space  or  on 
an  infinite  flat  ground,  and  therefore  cannot  model  large  indoor  scenes  (e.g.  parking  lot)  or  outdoor  scenes 
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Figure  3.5:  Comparison  between  the  magnitude  of  the  total  pressure  field  computed  by  our  hybrid  technique 
and  BEM  for  various  scenes.  In  the  top  row,  the  red  dot  is  the  sound  source,  and  the  blue  plane  is  a  grid  of 
listeners.  Errors  between  our  method  and  BEM  for  each  frequency  are  shown  in  each  row.  For  our  hybrid 
technique,  the  effect  of  the  two  walls  are  simulated  by  numerical  acoustic  techniques,  and  the  interaction 
between  the  ground  or  the  room  is  handled  by  geometric  acoustic  techniques.  For  BEM,  the  entire  scene 
(including  the  walls,  ground,  and  room)  is  simulated  together.  The  last  column  also  shows  comparison  with 
a  pure  geometric  technique  (marked  as  “GA” ) . 


with  uneven  terrains.  If  an  outdoor  scene  has  a  large  object,  the  algorithm  proposed  in  [98]  would  slow 
down  considerably.  The  coupling  with  geometric  propagation  algorithm,  on  the  other  hand,  enables  us  to 
model  acoustic  interactions  with  all  kinds  of  environment  features.  It  is  relatively  easier  to  extend  our  hybrid 
approach  to  inhomogeneous  environments  by  using  curved  ray  tracing.  Furthermore,  geometric  ray  tracing 


is  also  used  to  perform  frequency  decomposition  and  this  results  in  improved  sound  rendering. 


Scene 

air  vol. 
(m3) 

surf,  area 

(m2) 

FDTD 

ARD 

BEM/ 

FMM 

Ours 

Bldg+small 

1800 

660 

0.2  TB 

5  GB 

6  GB 

12  MB 

Bldg+med 

3200 

1040 

0.3  TB 

9  GB 

9  GB 

12  MB 

Bldg+large 

22400 

3840 

2.2  TB 

60  GB 

34  GB 

12  MB 

Reservoir 

5832000 

32400 

578  TB 

16  TB 

307  GB 

42  MB 

Parking 

9000 

2010 

0.9  TB 

24  GB 

2  GB 

9  MB 

Table  3.3:  Memory  Cost  Saving.  The  memory  required  to  evaluate  pressures  at  a  given  point  of  space. 
This  corresponds  to  the  same  operation  shown  in  the  rightmost  column  of  performance  Table.  Compared  to 
standard  numerical  techniques,  our  method  provides  3  to  7  orders  of  magnitude  of  memory  saving  on 
the  benchmark  scenes. 
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Figure  3.6:  Error  ||Pref  —  Pref||2/||Pref||  between  the  reference  wave  solver  (BEM)  and  our  hybrid  technique 
for  varying  maximum  order  of  reflections  modeled.  The  tested  scene  is  the  ’’Two  walls  in  a  room”  (see  also 
Figure  3.5,  last  column). 

3.8  Limitations,  Conclusion,  and  Future  Work 


We  have  presented  a  novel  hybrid  technique  for  sound  propagation  in  large  indoor  and  outdoor  scenes.  The 
hybrid  technique  combines  the  strengths  of  numerical  and  geometric  acoustic  techniques  for  the  different 
parts  of  the  domain:  the  more  accurate  and  costly  numerical  technique  is  used  to  model  wave  phenomena 
in  near-object  regions,  while  the  more  efficient  geometric  technique  is  used  to  handle  propagation  in  far-held 
regions  and  interaction  with  the  environment.  The  sound  pressure  held  generated  by  the  two  techniques  is 
coupled  using  a  novel  two-way  coupling  procedure.  The  method  is  successfully  applied  to  different  scenarios 
to  generate  realistic  acoustic  effects. 

Our  approach  has  a  few  limitations.  Currently  our  geometric  technique  assumes  homogeneous  medium 
and  traces  straight  ray  paths.  However,  in  the  case  of  inhomogeneous  medium,  where  the  speed  of  sound  is 
not  constant  and  the  rays  may  travel  in  curved  paths,  a  nonlinear  ray-tracing  module  can  be  integrated  into 
our  hybrid  system  instead. 

The  performance  of  our  spatial  decomposition  depends  greatly  on  the  size  of  f lN .  Although  it  size  is 
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Figure  3.7:  Breakdown  of  Precomputation  Time.  For  a  building  placed  in  terrains  of  increasing 
volumes  (small,  medium,  and  large  walls),  the  yellow  part  is  the  simulation  time  for  the  numerical  method, 
and  the  green  part  is  for  the  geometric  method.  The  numerical  simulation  time  scales  linearly  to  the  largest 
dimension  (L)  of  the  scene  instead  of  the  total  volume  (V). 


smaller  than  the  entire  simulation  domain,  an  individual  f lN  may  still  be  too  large,  especially  when  the  wave 
effects  near  a  large  object  need  to  be  computed  and  this  increases  the  complexity  of  our  algorithm.  One 
interesting  topic  to  investigate  is  the  possibility  of  not  enclosing  the  whole  object,  but  only  parts  of  it  (e.g. 
small  features)  in  flN . 

We  currently  compare  our  simulation  results  with  simulated  data  from  a  high-accuracy  BEM  solver. 
It  would  be  an  important  future  work  to  validate  these  results  with  recorded  audio  measurements,  when 
accurate  measurements  with  binaural  sound  recordings  and  spatial  sampling  in  complex  environments  are 
available. 

Additionally  our  approach  and  system  implementation  is  currently  limited  to  mostly  static  scenes  with 
moving  sound  sources  and/or  listeners.  Nonetheless  the  use  of  transfer  functions  lays  the  foundation  for 
future  extension  to  fully  dynamic  scenes,  as  the  per-object  transfer  functions  of  an  object  can  be  reused 
even  when  the  object  is  moved.  In  order  to  recompute  inter-object  transfers  as  multiple  objects  move  in  a 
dynamic  scene,  a  large  number  of  rays  (the  number  of  outgoing  sources  for  all  frequency  samples  multiplied 
by  thousands  of  rays  emitted  per  source)  need  to  be  retraced.  We  would  like  to  explore  the  use  of  the 
Fast  Multipole  Method  (FMM)  [53]  to  reduce  the  number  of  outgoing  sources  for  far-held  approximations. 
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The  computation  of  transfer  function  is  currently  implemented  with  unoptimized  MATLAB  code,  and  using 
high-performance  linear  solvers  (CPU-  or  GPU-based)  can  greatly  improve  the  performance. 
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Chapter  4 


Analytic  Ray  Curve  Tracing  for 
Outdoor  Sound  Propagation 

4.1  Overview 

Outdoor  sound  propagation  presents  challenging  scenarios  for  computational  simulation  with  inhomogeneous, 
moving  media  and  complex  obstacles.  Existing  methods  rely  on  simplified  models  of  media  conditions  or 
obstacles  with  limited  complexity  to  keep  the  computational  cost  low.  Numerical  models  that  handle  gen¬ 
eral  media  and  arbitrary  media  boundaries  do  not  scale  well  with  the  size  and  complexity  of  the  scenes, 
making  them  prohibitively  expensive  for  large  three-dimensional  outdoor  scenes.  In  this  paper,  a  ray  tracing 
method  using  analytic  ray  curve  as  tracing  primitives  is  presented  to  improve  the  efficiency  of  outdoor  sound 
propagation  in  fully  general  settings.  This  ray  curve  tracer  inherits  the  efficiency  and  flexibility  of  recti¬ 
linear  ray  tracers  in  handling  boundary  surfaces,  and  it  overcomes  the  performance  limitations  imposed  by 
approximating  the  curved  propagation  paths  in  inhomogeneous  media  with  rectilinear  rays.  Adaptive  media 
traversal  as  well  as  acceleration  structures  for  surfaces  intersection  leads  to  further  savings  in  computation. 
Speedup  up  to  an  order  of  magnitude  is  demonstrated  on  outdoor  benchmark  scenes  in  comparison  with 


existing  ray  models.  This  ray  tracer  serves  as  a  stand-alone  tool  for  fast  outdoor  sound  propagation,  and  it 
is  complementary  to  existing  geometric  and  numerical  methods  and  extendable  in  multiple  ways. 

4.2  Introduction 

Sound  propagation  in  outdoor  environments[133,  11,  73]  and  room  acoustics[85]  are  traditionally  investigated 
separately.  Room  acoustics  focus  on  the  high  order  interactions  that  sound  wave  makes  with  surfaces  within 
an  enclosed  space,  assuming  homogeneous  media.  On  the  other  hand,  outdoor  sound  propagation  (atmo¬ 
spheric  and  underwater)  deals  with  spatially  varying  as  well  as  moving  media,  and  the  curved  propagation 
trajectories  resulted  from  refraction  is  the  most  prominent  departure  from  indoor  models.  In  the  outdoor 
context,  obstacles  that  constitute  media  boundaries  are  generally  assumed  to  be  sparse,  and  the  complexity 
of  the  boundary  surfaces  is  limited  in  the  investigations. 

However,  many  outdoor  applications  can  benefit  from  simultaneous  modeling  of  fully  general  media  and 
complex  boundaries,  examples  include  large  expanse  of  complex  terrains  and  sprawling  urban  area  that  often 
has  its  own  micro-climate.  As  the  data  on  the  spatially  and  temporally  varying  atmosphere  and  the  ocean 
becomes  increasingly  available,  and  geographic  information  of  terrains  and  man-made  structures  come  in  ever 
richer  details,  it  would  be  ideal  that  the  sound  propagation  can  utilize  the  full  scale  of  such  data.  Existing 
numerical  methods  such  as  FDTDs  suffer  from  performance  and  scalability  issues  with  large  domain  volume, 
high  frequency,  or  complex  boundaries.  Geometric  acoustics  (GA)[115]  methods  like  ray  models  are  efficient 
at  handling  boundary  surfaces,  but  with  inhomogeneous  media  the  rectilinear  rays  are  forced  to  advance  in 
small  steps. 

One  way  to  speed  up  ray  models  for  inhomogeneous  media  involves  subdividing  the  media  into  spatial 
cells,  so  that  within  each  cell  the  media  assumes  simple  profile  that  leads  to  an  analytic  curved  trajectory. 
The  cell  method  was  proposed  as  exactly  such  a  model  based  on  a  triangular  subdivision  of  two-dimensional 
media,  and  it  has  been  implemented  in  some  acoustic  simulation  software[130,  172].  A  later  work[101] 
for  both  light  and  sound  propagation  followed  the  idea  of  cell  method  and  decomposed  three-dimensional 
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media  profile  into  tetrahedral  cells.  Mesh  connectivity  is  utilized  to  trace  rays  across  neighboring  cells,  and 
boundary  surfaces  are  embedded  into  the  mesh  so  that  boundary  intersection  is  solved  in  the  same  mesh 
traversing  process. 

This  paper  develops  the  idea  of  cell  method  further,  and  improves  upon  the  tetrahedral  cell  method[101] 
in  three  important  aspects  of  the  algorithm: 

1.  Simple  ray  formulation  We  select  the  parabolic  ray  curve  to  be  the  ray  tracing  primitives,  giving 
the  simplest  analytic  form  of  trajectory,  intersection,  and  ray  properties  (Sec. 4. 4). 

2.  Implicit  cell  We  use  a  mesh- less  approach  for  media  traversal,  tracing  ray  curve  segments  of  adaptive 
sizes  based  on  on-tlre-fly  sampling  of  the  media  profile.  This  approach  of  implicit  cells  avoids  the  costly 
mesh  construction,  and  it  supports  moving  media  as  well  as  dynamic  media  (Sec. 4. 5. 2). 

3.  Acceleration  of  surface  intersection  we  adapt  hierarchical  acceleration  structures  used  in  recti¬ 
linear  ray  tracer  for  the  ray  curve  tracer.  Further  acceleration  is  achieved  by  spatial  bounding  of  ray 
curves  based  on  their  geometric  properties  for  higher  culling  efficiency  (Sec. 4. 5. 3). 

Overall  this  analytic  ray  curve  tracer  is  designed  to  be  efficient  for  both  general  media  profile  and  complex 
surfaces.  Its  performance  is  demonstrated  on  outdoor  benclrmarks(Sec.4.6),  showing  significant  improve¬ 
ments  over  previous  ray  models  and  software  tools.  It  is  complementary  to  a  set  of  numerical  and  geometric 
methods,  and  can  be  extended  in  multiple  ways  as  discussed  in  Sec.  4.7. 

4.3  Prior  Work 

Outdoor  sound  propagation  has  been  studied  extensively  in  the  context  of  underwater[73]  as  well  as  at¬ 
mospheric  acoustics[133].  Early  methods  include  Fast  Field  Program  (FFP)[39]  and  Parabolic  Equation 
(PE)  [102,  50].  Both  methods  provide  frequency  domain  full  wave  solutions  with  simplifying  assumptions 
about  the  media.  More  general  numerical  methods  have  been  proposed  to  handle  arbitrary  media  and 
obstacles,  the  most  popular  one  being  the  Finite  Difference  Time  Donrain(FDTD)  method  that  solves  the 
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linearized  Euler  equation[110,  175].  The  disadvantage  of  the  FDTD  is  mainly  its  high  computational  costs, 
in  particular,  it  does  not  scale  well  with  the  volume  of  the  simulation  domain  or  the  frequency.  Other 
frequency  domain  methods  including  FEM[103,  164],  BEM[26],  Pseudo  Spectral  Time  Domain[67,  66],  and 
Transmission  Line  Matrix  (TLM)[64,  52,  13]  face  similar  challenges  in  scalability,  making  them  prohibitively 
expensive  for  large-scale  broadband  simulation. 

Geometric  acoustics  (GA)[115]  is  a  family  of  methods  that  are  widely  used  in  room  acoustics  [85].  They 
rely  on  the  high  frequency  approximation  of  wave  propagation  with  rays,  which  are  perpendicular  to  the 
wave  fronts  and  can  be  intersected  efficiently  with  scene  surfaces.  Various  GA  techniques  were  developed, 
including  image  source  method[7,  22],  ray  tracing[167,  149],  frustum  tracing[148],  beam  tracing[46,  32], 
sonal  mapping[19,  38,  79],  path  tracing[136],  and  radiosity[160].  The  sound  speed  is  generally  assumed  to 
be  constant  so  rectilinear  rays  are  used  to  trace  out  straight-line  paths.  Complex  interactions  are  modeled 
at  ray-surface  intersections,  such  as  specular  reflection,  Snell’s  Law  refraction,  diffuse  reflection  [149,  136], 
and  Bidirectional  Reflectance  Distribution  Function  (BRDF)[79,  137]. 

The  ray  models  have  also  been  applied  to  inhomogeneous  outdoor  media  by  numerical  integration  of  the 
differential  ray  equation,  effectively  tracing  segments  of  rectilinear  rays  (see  survey  [133,  73]).  The  segment 
size  is  limited  by  the  range  within  which  the  media  can  be  assumed  to  be  homogeneous,  and  intersection  with 
surfaces  need  to  be  tested  for  each  segment.  With  complex  media  the  great  number  of  small  ray  segments 
dominate  the  propagation  computation.  Higher  order  integration  like  the  fourth-order  Runge-Kutta  method 
is  adopted  to  improve  efficiency,  but  with  those  methods  the  segment  between  each  advancement  of  the  ray 
integration  is  no  longer  rectilinear,  complicating  intersection  tests  with  surfaces.  Several  widely  used  software 
are  based  on  ray  models,  including  HARPO[75],  BELLHOP [118],  WaveQ3D[126]  in  underwater  acoustics, 
HARPA[74]  in  atmospheric  acoustics  (see  [107]  for  the  collection).  Although  performance  of  those  ray- 
based  software  are  orders  of  magnitude  better  than  wave-based  alternatives  (a  few  seconds  vs.  thousands  of 
seconds  on  2D  propagation  in  layered  media,  reported  recently [168,  43]),  they  quickly  becomes  prohibitively 
expensive  with  general  media  and  three-dimensional  complex  scenes. 


91 


4.4  Analytic  Ray  Curve 


Ray  models  simulate  the  wave  propagation  with  rays  that  are  perpendicular  to  the  wavefront.  For  certain 
media  profiles,  the  ray  trajectories  as  well  as  the  pressure  on  the  ray  have  analytic  form  that  can  be  evaluated 
at  any  point  along  the  ray  with  constant  cost.  Our  ray  tracer  utilizes  one  of  such  analytic  formulations, 
chosen  for  its  particularly  simple  form,  and  the  resulting  ray  curve  lends  itself  to  easy  intersections  with 
boundary  surfaces.  As  can  be  seen  from  the  algorithm  outline  in  Fig.  4.2,  this  particular  ray  curve  enables 
analytic  evaluations  of  constant  per-ray  cost  in  key  steps  of  the  ray  tracing  (green-colored  blocks) .  This  ray 
formulation  is  a  special  case  of  the  more  general  ray  theory  discussed  in  depth  in  Cerveny’s  comprehensive 
work[161]. 


4.4.1  Ray  formulation 

Within  a  media  with  spatially  varying  sound  speed,  we  denote  V(xi)  =  c  as  the  sound  speed  at  location 
Xi .  The  arrival  time  or  travel  time  field  T  is  defined  as  the  time  that  a  sound  wave  takes  to  travel  from 
its  source  to  a  field  location,  and  the  spatial  derivative  of  T:  p  =  VT  is  called  the  slowness  vector.  Within 
such  a  media,  the  ray  trajectories  can  be  solved  from  the  general  Hamiltonian  form  of  the  Eikonal  equation, 
which  in  Cartesian  coordinates  reads: 


-\PiPi)n/2  -  =  1,2,3, 


(4.1) 


and  n  is  a  real  number.  Equation  4.1  can  be  solved  in  terms  of  characteristics :  3-D  trajectories  xt  =  Xi(u ) 
along  which  4.1  is  satisfied,  parameterized  by  u.  The  characteristic  system  of  Equation  4.1  is: 


d X{ 
d-u 


( PkPk)n/ 2  1Pu 


d pi 

du 


dr 

du 


( PkPk)n/ 2  =  v~n. 


ld_ 

n  dxi 


( 


_L) 

Vn>' 


(4.2) 


When  the  gradient  of  V  n  is  constant,  4.2  can  generally  be  solved  by  quadrature.  When  we  assume  the 
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Figure  4.1:  (a)  Parabolic  ray  curve:  shown  in  r-z  plane,  3-axis  being  the  direction  of  VP-2.  Rays  with  two  different 
initial  directions  (marked  by  arrows)  are  shown  here.  ( Xf,Zf )  defining  the  vertex  of  the  parabola  is  computed  from 
Eq.  4.6.  (b)  Efficiency  of  curve  tracing:  with  constant  ||  V V— "  || ,  one  segment  of  analytic  curve  (blue  line)  suffices 
as  compared  to  many  steps  (magenta  circles)  taken  by  numerical  ray  integration.  The  trajectory  traced  out  by  RK-4 
visibly  diverges  from  the  parabolic  curve,  which  can  be  corrected  by  further  reducing  step  sizes,  (c)  Adaptive 
segments  of  curves:  with  logarithmic  sound  speed  profile,  our  ray  tracer  proceeds  in  segments  of  parabolic  curves 
with  adaptive  sizes,  evident  from  the  visualization  of  the  spheres  of  validity  (Sec.  4.5.2). 

gradient  of  V~2  is  constant  (i.e.  n  =  2):  P(x)-2  =  Aq  +  A  •  x,  we  obtain  the  simplest  analytic  form  for  ray 
trajectory  and  travel  time,  in  terms  of  the  parameter  u  =  a: 


Xi{a)  =  xi0+pi0(a  -  cr0)  +  ^(ct  -  <R))2,  (4.3) 

PiW)  =  Pm  +  \Ai(a  -  ao ),  (4-4) 

T(a)  =  T(ct0)  +  V0~2(a  -  <r0)  +  ^Aipi0(a  -  a0)2  +  ^AiA^cr  -  <r0)3,  (4.5) 


and  a  is  related  to  the  travel  time  T  and  to  arclength  s  by  da  =  V2d T  =  Vd s. 

4.4.2  Ray  properties 

We  take  advantage  of  two  key  properties  of  the  ray  curve  given  in  Sec.  4.4.1: 
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1.  Intersection  the  ray  is  a  parabolic  plane  curve  and  can  be  intersected  analytically  with  planar  surfaces 
to  solve  for  intersection  point. 

2.  Evaluation  properties  including  but  not  limited  to  position,  tangent  direction,  slowness  vector,  arrival 
time,  and  arclength  can  be  evaluated  analytically  for  arbitrary  parameter  a  along  the  ray,  at  constant 
cost. 

It  is  evident  from  equation  4.3  that  the  analytic  ray  lies  in  a  plane  with  the  normal  of  A  x  po-  We  denote 
the  direction  of  the  media  gradient  A  as  the  z-axis,  and  (A  xpo)  x  A  as  the  r-axis,  the  r-z  plane  is  then  called 
the  ray  plane.  When  the  origin  of  the  coordinate  system  is  placed  at  the  ray  origin  xo,  the  ray  trajectory 
can  be  written  as: 


z(r)  =  (r-rf)2  -zf, 


Tf  = 


2£o\/— £o2 


n0‘ 


>zf  = 


n0 2  - 


a 


(4.6) 


where  a  =  ||A||  is  the  magnitude  of  media  gradient,  £o  =  V(O)~1cos0o >  V(0)  is  the  sound  speed  at  ray  origin, 
and  9o  is  the  angle  between  the  ray  direction  at  origin  and  the  r  axis  (see  Figure  4.1).  The  parabolic  plane 
curve  given  by  Eq.  4.6  has  closed-form  solutions  in  terms  of  intersection  with  planar  surfaces. 

For  an  arbitrary  point  on  the  ray  curve  with  coordinates  ( rp,zp )  in  the  ray  plane,  the  corresponding 
parameter  ap  can  be  solved  from  Equation  4.3  as  ap  =  rp/£o  +  <Jo ,  where  <jq  is  the  a  at  ray  origin.  Given  ap, 
the  position  x(ap),  slowness  vector  p(ap),  and  travel  time  T(ap)  are  given  by  Eq.  4. 3, 4. 4, 4. 5,  respectively. 
The  tangent  direction  t,  coincides  with  the  direction  of  the  slowness  vector  p  and  can  be  evaluated  by: 
t(crp)  =  p((Xp)/||p(crp)||.  The  arclength  s  of  the  parabolic  ray  curve  can  be  evaluated  at  ( rp,zp )  by 


\J  ( arp  +  b)2  +  1  (arp  +  b)  +  arcsinh(arp  +  b) 


s(rp )  = 


where  a  =  -  -k  ,  b  = 

2£o2 


2a 

V-Zo  +  ^(Q)-2 


(4.7) 


Similar  to  rectilinear  rays,  the  parabolic  ray  curve  assuming  constant  W  2  comes  with  efficient  oper¬ 
ations  for  finding  intersection  and  updating  the  key  variables  (position,  direction,  arclength,  travel  time, 
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Figure  4.2:  Left  steps  of  tracing  one  segment  of  analytic  curve,  among  which  the  green-colored  blocks  have  analytic 
solutions  due  to  the  parabolic  ray  formulation,  leading  to  large  savings  of  ray  tracing  cost.  Right  Besides  the  analytic 
formulation,  other  sources  of  performance  improvements  lead  to  our  ray  tracer  tracing  fewer  segments  with  fewer 
intersection  tests,  and  cheaper  intersection  tests  as  well.  Those  algorithmic  improvements  are  connected  to  the  steps 
in  the  flow  chart  where  they  happen. 


etc.)  at  intersection  points,  making  it  amenable  to  be  used  as  a  tracing  primitive.  The  ray  curve  serves  as  a 
building  block  for  a  ray  tracer  that  handles  general  media,  by  computing  the  propagation  paths  consisting 
of  consecutive  segments  of  such  ray  curves.  The  flow  chart  in  Fig.  4.2  illustrates  the  steps  involved  in  trac¬ 
ing  one  such  segment,  with  the  computational  savings  highlighted  by  green-colored  blocks.  The  sources  of 
performance  improvements  are  listed  on  the  right  side  of  Fig.  4.2,  and  the  ray  tracing  algorithm  is  detailed 
next. 
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4.5  Ray  Tracing  Algorithm 


4.5.1  Tracing  Analytic  Segments 

Our  ray  tracer  computes  the  propagation  paths  in  a  segment-by-segment  fashion,  each  segment  being  a 
parabolic  ray  curve  formulated  in  Sec.  4.4.  There  are  two  criteria  for  terminating  a  segment  and  starting 
the  next  one: 

1.  Media  being  traversed  no  longer  satisfy  the  assumption  of  constant  VV~2, 

2.  Surfaces  of  media  boundary  or  obstacles  in  the  scene  are  encountered. 

For  the  former  we  proposed  an  adaptive  segment  size  computed  from  on-the-fly  sampling  of  the  media, 
discussed  in  details  in  Subsec.  4.5.2,  and  for  the  latter  we  used  acceleration  data  structures  and  methods 
adapted  from  rectilinear  ray  tracer,  discussed  in  Subsec.  4.5.3.  Both  aspects  of  the  ray  tracing  algorithm 
contributes  to  its  efficiency  and  performance  improvements  over  previous  methods. 

Based  on  those  criteria,  we  find  out  how  far  a  particular  segment  goes,  by  computing  the  closest  intersec¬ 
tion  point  for  surface  encounter,  and  the  point  that  the  ray  curve  reach  the  edge  of  the  range  of  validity  for 
the  constant  media  gradient  assumption.  The  closest  point  between  the  two  cases  are  taken  as  the  end  point 
of  the  current  segment,  at  which  a,  x(<r),  t(a),  T(a),  s(a)  are  evaluated.  The  next  segment  starts  at  x(cr), 
the  trajectory  of  which  is  computed  from  the  local  media  gradient  Vb^2(x((j)).  a,  T(a),  s(a)  are  continued 
over  the  next  segment,  and  the  ray  direction  t(a)  is  also  continued  except  for  the  surface  encounter  case.  At 
surface  encounter  a  reflection  direction  t'  is  computed  based  on  the  surface  reflection  model  as  the  initial 
direction  of  the  next  segment. 

The  ray  traversal  continue  in  this  segment-by-segment  way  until  it  reaches  outside  the  media  volume, 
it  interacts  with  surfaces  a  pre-determined  number  of  times,  or  the  pressure  amplitude  along  the  ray  drops 
below  a  pre-set  threshold.  The  ray  tracer  finds  the  set  of  propagation  paths  starting  from  the  sound  source, 
and  the  ray  segments  making  up  those  paths  can  be  queried  for  any  field  location  to  compute  the  pressure 
held. 
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4.5.2  Adaptive  Media  Traversal 


Within  a  smoothly  varying  media  with  local  coherence,  the  media  gradient  W-2  can  be  assumed  to  be 
constant  within  a  certain  range  of  validity.  This  is  a  valid  assumption  for  naturally  occurring  media  such  as 
the  atmosphere  or  the  ocean  under  stable  conditions.  Furthermore,  our  ray  tracer  uses  the  heuristics  that 
the  range  of  validity  for  a  locally  constant  W-2  should  be  proportional  to  ||  VW— 2 1| ,  so  that  the  media 
gradient  remains  valid  for  a  relatively  small  spatial  range  in  areas  of  great  variations.  Such  adaptive  sizing 
facilitates  accurate  traversal  of  the  media  without  the  cost  of  uniformly  small  segment  size. 

Based  on  this  heuristics,  we  compute  the  range  of  validity  d(x)  by  5  =  |(W_2)d3(x),  which  is  controlled 
by  a  global  parameter  S.  Given  the  local  Vb“2(x),  the  ray  curve  given  by  Eq.  4.6  is  computed,  and  continued 
until  the  border  of  a  sphere  centered  at  the  ray  origin  x  and  with  the  radius  d(x).  The  parabolic  ray  curve 
has  closed-form  intersection  with  its  validity  sphere,  and  the  exit  point  x'  from  the  validity  sphere  is  used 
to  sample  the  media  again  for  the  next  segment. 

Given  an  input  media  profile  V(x),  on-tlre-fly  sampling  of  W-2(x)  can  be  computed  either  analytically 
if  the  input  profile  is  available  as  an  analytic  function,  or  by  differentiation  techniques  such  as  central 
differences.  Moving  media  can  be  approximated  by  the  standard  effective  sound  speed  approach,  adding 
to  or  subtracting  the  wind  velocity  u>(x)  from  V(x),  for  upwind  or  downwind  propagation  respectively. 
Particularly  for  our  ray  tracer,  the  on-the-fly  media  gradient  sampling  enables  handling  of  vector  wind  field 
by  adding  the  wind  speed  projected  onto  the  ray  direction  V(x)  +  w(x)  •  t(x). 

4.5.3  Handling  Boundary  Surfaces 

For  existing  rectilinear  ray  tracers,  ray  intersections  with  complex  surfaces  in  the  scene  is  the  most  expensive 
part  of  the  computation,  and  many  acceleration  structures  and  techniques  have  been  developed  to  speed 
it  up.  The  parabolic  ray  curve  we  use  as  tracing  primitives  can  be  used  with  the  hierarchical  acceleration 
structures  adapted  from  rectilinear  ray  tracing  that  bound  the  scene  surfaces.  The  ray  curves  themselves 
can  also  be  bounded  spatially  for  further  culling  to  save  intersection  costs. 

For  scene  surfaces  culling  we  built  a  BVH  (Bounding  Volume  Hierarchy)  [58]  to  group  boundary  surfaces 
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and  bound  them  spatially.  A  BVH  is  a  hierarchical  tree  structure  whose  internal  nodes  are  bounding  volumes 
(BVs),  and  we  choose  AABBs  (axis-aligned  bounding  boxes)  for  the  shape  of  the  nodes.  The  parabolic  ray 
curve  can  be  intersected  with  an  AABB  much  faster  than  with  all  its  enclosed  surfaces,  and  when  the  ray 
curve  and  the  AABB  are  not  intersecting,  all  those  surfaces  can  be  culled  from  further  computations.  To 
find  the  closest  intersection  with  the  scene  surfaces,  the  BVH  is  traversed  by  each  ray  curve  in  a  top-down 
fashion.  Logarithmic  intersection  time  on  average  is  achieved  with  regard  to  the  number  of  surfaces  in  the 
scene.  In  addition,  the  BVH  can  be  re-fitted  or  rebuilt  efficiently  to  accommodate  dynamic  scenes. 

Besides  spatial  culling  of  the  scene  surfaces  with  a  BVH,  we  also  perform  spatial  bounding  and  culling  of 
ray  curves.  As  we  traverse  the  media  adaptively  as  described  in  Sec.  4.5.2,  we  compute  a  bounding  sphere 
(the  range  of  validity  sphere)  for  each  segment  as  a  by-product.  When  intersecting  such  a  segment  with 
the  scene  BVH,  the  culling  efficiency  is  improved  by  intersecting  the  bounding  sphere  of  the  segment  with 
the  BVs  first,  which  is  a  simpler  computation  than  intersecting  the  ray  curve  directly.  Furthermore,  as  the 
parabolic  ray  is  a  plane  curve,  the  ray  plane  is  used  to  reduce  the  dimension  of  the  intersection  test  which 
achieves  further  culling. 

At  the  intersection  point  with  a  boundary  surface,  the  interaction  can  be  modeled  by  specular  reflection, 
Snell’s  law  refraction,  or  BRDF-based  reflection,  depending  on  the  surface  properties,  and  the  direction  of 
the  secondary  ray  segment  can  be  computed  accordingly.  A  surface  impedance  model  can  be  used  to  modify 
the  amplitude  and  the  phase  of  the  pressure  along  the  ray,  in  addition  to  direction  change. 

4.6  Results 

4.6.1  Performance  Comparison  of  Ray  Models 

Ray  models  are  the  foundation  of  many  software  tools  that  are  widely  used  in  underwater  acoustic  simula¬ 
tion  [130,  172,  118,  126],  and  to  a  lesser  degree  in  atmospheric  acoustics [74].  The  majority  of  the  available 
tools  assume  either  range-independent  media  or  profiles  that  vary  in  two  dimensions,  with  the  exception  of 
HARPO  and  HARPA,  which  computes  three-dimensional  Hamiltonian  ray  tracing  on  clusters.  The  existing 
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software  is  written  in  different  languages  with  different  performance  characteristics  (e.g.  MATLAB,  Fortran), 
the  available  performance  data  are  scarce  and  some  of  them  rather  dated.  We  reviewed  the  data  we  could 


find  in  Sec.  4.3,  and  here  we  compare  the  performance  between  our  method  and  numerical  ray  integration 
that  we  implemented  in  our  own  code  base. 

Existing  ray  models  use  direct  numerical  integration  of  the  ray  equations  with  Euler  method  or  higher 
order  such  as  fourth  order  Runga-Kutta  (RK4).  The  size  of  the  integration  timestep  is  an  important  factor 
that  controls  the  trade-off  between  performance  and  accuracy.  Our  ray  tracer  is  similarly  controlled  by 
the  global  parameter  S  (Sec.  4.5.2)  ,  which  also  leads  to  smaller  steps  producing  more  accurate  results  yet 
longer  computation  time.  To  highlight  this  trade-off,  We  conduct  two  types  of  comparisons:  same  quality 
comparisons,  for  which  ray  models  are  tuned  to  achieve  the  same  level  of  errors  and  their  performances  are 
compared;  and  same  speed  comparisons,  for  which  ray  models  are  run  at  the  same  level  of  performance,  and 
the  resulted  errors  are  compared. 

We  measure  the  accuracy  of  ray  tracer  by  the  two  variables  that  are  crucial  for  pressure  computation: 
hit  location  of  the  ray  at  a  particular  range  and  the  arclength  of  the  entire  trajectory.  To  quantify  the 
difference  between  two  sets  of  results  rt  and  rt  tracing  the  same  set  of  initial  rays,  we  compute  the  relative 
error:  Erel  = 

We  select  two  media  profiles  as  our  test  cases:  downward-refracting  atmosphere  with  sound  speed  gradient 
of  0.1,  and  the  Munk  profile  used  in  ocean  acoustics.  Both  profiles  have  W~2  and  VE  in  analytic  form, 
which  decouple  the  measurements  of  ray  tracing  accuracy  from  the  accuracy  of  gradient  estimation.  We 
also  limit  our  test  to  2D  cases  with  a  flat  ground  as  obstacle,  for  fair  comparison  with  the  majority  of 
existing  ray-based  software.  The  performance  improvements  for  3D  cases  is  expected  to  be  greater,  which  is 
demonstrated  in  Section  V.B.4.6.2. 


Downward  refracting  atmosphere 

The  first  media  profile  we  tested  is  a  linear  sound  speed  profile  that  leads  to  a  downward  refracting  atmo¬ 
sphere:  c(z)  =  c(0)  +0.1m/s,  where  c(z )  is  the  effective  sound  speed  at  the  height  z,  and  c(0)  is  the  effective 
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Figure  4.3:  Downward  refracting  atmosphere:  ray  plot  of  201  rays  computed  by  the  analytic  ray  curve  tracer. 


sound  speed  on  the  ground.  We  compute  the  ray  tracing  results  for  the  range  0  —  lOfcm  and  the  height 
1  —  1  km.  The  ray  tracing  results  are  gathered  and  compared  at  the  lOfcm  mark,  at  which  the  rays  have  been 
reflected  up  to  25  times.  An  example  of  the  computed  ray  trajectories  can  be  seen  in  Fig.  4.3,  with  a  fan  of 
201  rays  between  the  elevation  angle  -90°and  90°. 

The  same  quality  and  the  same  speed  comparison  between  our  ray  tracer  and  the  RK4  ray  integrator 
are  listed  in  Table  4.1  and  4.2.  To  achieve  the  same  error  in  ray  tracing  results,  our  ray  tracer  runs  an 
order  of  magnitude  faster  than  the  numerical  integrator.  On  the  other  hand,  with  the  same  running  time 
the  numerical  integrator  produce  ray  tracing  results  with  higher  levels  of  error.  For  this  comparison  only 
the  computation  of  ray  trajectories  and  arclength  are  included,  while  the  pressure  computation  is  excluded. 
However,  the  higher  number  of  ray  steps  used  by  the  numerical  integrator  indicate  that  with  additional 
pressure-related  computations  at  each  step  the  performance  improvements  of  our  ray  tracer  will  be  even 
higher. 


Rel.  Error 

le-2 

5e-3 

le-3 

5e-4 

2e-4 

Ray  Curves 

219 

892 

2,810 

3,972 

8,881 

(segments/path) 

RK4  Steps 

467 

2,330 

4,881 

24,588 

49,158 

(steps/path) 

Table  4.1:  Same  quality  comparison.  A  fan  of  rays  are  traced  under  a  downward  refracting  atmosphere  to  a 
range  of  10km.  At  each  level  of  relative  error  in  the  ray  tracing  results,  the  average  number  of  ray  curve  segments 
per  propagation  path  is  compared  with  the  average  number  of  Runga-Kutta  4  steps.  Analytic  ray  curve  is  able  to 
achieve  same  level  of  accuracy  with  much  less  tracing  cost. 


We  also  test  the  scalability  of  the  ray  models  with  increasing  magnitude  of  media  gradient  in  Fig.  4.4 
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#  Ray  steps/ 
RK4  segments 

1  x  102 

5  x  102 

1  x  103 

5  x  103 

1  x  104 

RT  rel.  error 
RK4  rel.  error 

1.31e-2 

4.35e-2 

2.76e-3 

9.44e-3 

8.04e-4 

4.92e-3 

1.98e-4 

2.14e-3 

4.33e-5 

9.12e-4 

Table  4.2:  Same  speed  comparison.  A  fan  of  rays  are  traced  under  a  downward  refracting  atmosphere  to  a  range 
of  10km.  With  comparable  number  of  ray  curve  segments  and  Runga-Kutta  4  steps,  our  ray  tracer  (RT)  is  able  to 
achieve  lower  levels  of  relative  error  in  ray  tracing  results  (both  hit  points  and  path  length)  than  RK4. 


Runqa-Kutta  4  Step  Size  Magnitude  of  Sound  Speed  Gradient 

Figure  4.4:  Left:  Cost-accuracy  trade-off  of  RK  4,  to  achieve  lower  error  the  step  size  of  RK4  needs  to  decrease, 
which  leads  to  exponential  growth  in  computation  time.  Right:  Cost  with  increasing  media  gradient  grows  as  both 
RK  4  and  ray  curve  tracer  adopt  smaller  steps  to  keep  the  error  below  2  x  10— 4 ,  but  RK  4  shows  both  higher  cost 
and  lower  scalability  with  media  gradient. 


right.  The  media  gradient  is  increased  gradually  from  0.04  to  0.4  with  0.04  increments,  and  the  achieved 
relative  error  is  kept  at  2  x  10~4  by  tuning  the  step  size  for  RK  4  and  the  parameter  <5  for  our  ray  curve 
tracer.  Our  ray  tracer  is  shown  to  scale  better  with  increased  media  variations,  which  we  attribute  to  the 
adaptive  segment  size  we  used. 


Munk  profile 

The  second  media  profile  we  tested  is  the  Munk  profile  commonly  used  in  ocean  acoustics: 
c(z)  =  15001.0  +  0.00737[a;  —  1  +  exp(-x)],  where  c(z)  is  the  effective  sound  speed  at  the  depth  z,  z  <  5000m. 
We  compute  the  ray  tracing  results  for  the  range  0  —  100/cm  and  the  depth  0  —  5 km.  An  example  of  the 
computed  ray  trajectories  can  be  seen  in  Fig.  4.7,  with  a  fan  of  21  rays  between  the  elevation  angle  -13°and 
13°.  The  relative  errors  of  the  numerical  integrator  results  with  decreasing  step  sizes  is  plotted  in  Fig.  4.6, 
contrasted  with  the  same  cost-accuracy  analysis  for  the  analytic  ray  tracer.  Again,  for  this  comparison  only 
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Figure  4.5:  The  range  of  validity  for  each  segment  of  ray  curve  is  controlled  by  the  parameter  <5  and  adapts  to  local 
media  gradient.  (a)Munk  profile  sound  speed.  (b,c,d)  Segment  sizes  at  spatial  locations  within  the  Munk  profile 
visualized,  with  5  =  0.002,  0.001,  0.0005  respectively. 


Runga-Kutta  4  Step  Size  Ray  Curve  Validity  Range  Control  Parameter  a 


Figure  4.6:  Munk  profile  with  200km  range,  ray  tracing  cost-accuracy  trade-off  of  Left:  Runga-Kutta  4  and  Right: 
analytic  ray  curve. 


the  computation  of  ray  trajectories  and  arclength  are  included,  while  the  pressure  computation  is  excluded. 

For  this  profile  we  also  show  the  adaptive  segment  size  adopted  by  our  ray  tracer  by  visualizing  in  false 
color  (Fig.  4.5)  the  size  distribution  in  the  domain.  We  trace  rays  spanning  the  launch  angles  between 
-90°and  90°for  this  visualization,  and  we  contrast  it  with  a  false  color  visualization  of  the  magnitude  of 
gradient  for  Munk  profile.  The  local  coherence  of  media  properties  in  the  Munk  profile  is  taken  advantage  of 
by  our  adaptive  ray  tracer  to  focus  computation  and  improve  efficiency.  It  is  evident  from  the  visualization 
that  with  a  uniform  step  size  the  performance  will  reduce  significantly  even  with  the  same  analytic  ray 
formulation. 
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Figure  4.7:  Munk  profile  characteristic  ray  plot  computed  by  analytic  ray  tracer. 

4.6.2  Outdoor  Applications  of  Analytic  Ray  Tracer 

To  show  the  application  of  our  ray  tracer  on  fully  general  outdoor  scenes,  we  generate  a  representative  media 
profile  from  the  stratified-plus-fluctuation  model  that  is  widely  used  in  atmospheric  acoustics[133].  The 
acoustic  index  of  refraction  in  the  atmosphere  n  =  Cq/c,  where  Co  is  the  reference  sound  speed,  is  modeled 
with  a  stratified  component  nstr  and  a  fluctuation  component  rifiu,  so  that  n  =  nstr  +  rifiu.  The  stratified 
component  follows  a  logarithmic  profile  of  the  altitude  z:  nstr{z )  =  Co/(co  +  6  In  fj-  +  1^ ),  with  parameters 
no,  b ,  and  zg.  cq  is  the  sound  speed  at  the  ground  surface,  taken  as  the  reference  sound  speed,  and  zg  is  the 
roughness  length  of  the  ground  surface.  Typical  values  for  parameter  b  are  1  m/s  for  a  downward- refracting 
atmosphere  and  —1  m/s  for  an  upward- refracting  atmosphere.  The  fluctuation  component  models  the  three- 
dimensional  random  temperature  and  wind  speed  turbulence  in  the  atmosphere.  The  value  at  position  x 
can  be  computed  as  n/;M(x)  =  ]TV  G(kj)  cos(ki  •  x  +  tpi),  where  k,  is  the  wave  vector  describing  the  spatial 
frequency  of  the  fluctuation,  ipi  is  a  random  angle  between  [0,  2-7t],  and  G(ki)  is  a  normalization  factor. 

The  Desert  and  Christmas  benchmarks  represent  large- volume  outdoor  acoustic  scenes  that  have  complex 
surface  geometry  (e.g.  varying  terrains  and  buildings).  The  surface  primitive  count  and  the  volumetric 
expanse  of  the  two  scenes  are  listed  in  Table  4.3.  Both  scenes  are  visualized  in  Fig.  4.8,  and  we  include 
illustrative  3D  ray  trace  for  both  upward  refractive  and  downward  refractive  media  conditions. 
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Figure  4.8:  Outdoor  benchmarks  Desert  and  Christmas  village.  Example  ray  paths  are  shown  with  upward  and 
downward  refraction.  The  media  gradient  is  exaggerated  and  the  ray  path  number  is  kept  very  small  for  illustration 
purpose.  See  Table  4.3  for  scene  stats  and  actual  performance  numbers  of  tracing  10k  rays  up  to  3  orders  of  reflections. 


Figure  4.9:  Spatial  visualization  of  ray  tracing  cost:  brighter  color  indicates  higher  intersection  costs.  It  is 
shown  that  the  intersection  computation  concentrates  in  areas  of  dense  surface  geometry  and  around  silhouette  of 
obstacles,  which  illustrates  that  the  spatial  culling  of  ray-surface  intersections  using  BVH  is  effective.  Left:  Desert 
Right:  Christmas  village. 
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Scenes 

#  surf. 

#  seg  -jj.  BV  int.  #  surf.  int. 

per  path  per  seg  per  seg 

frame 

time 

Desert 

(200  x  200 
x50  m3) 

5,000 

31.88  2.32 

26.73  (94%)  (1.47%) 

446  ms 

Christmas 

(120  x  80 

x50  m3) 

16,000 

34.31  2.08 

44.45  (96.16%)  (1.22%) 

917  ms 

Table  4.3:  Breakdown  of  ray  tracing  time:  the  computation  of  closed-form  ray  trajectory  and  closed-form 
evaluations  of  ray  properties  for  each  segment  takes  very  little  computation  (less  than  0.001%  of  total  frame  time) 
and  thus  is  omitted  here.  The  percentage  that  each  kind  of  computation  takes  relative  to  the  total  frame  time  is 
included  in  parenthesis,  together  with  the  number  of  ray-BV  and  ray-surface  intersections. 


For  each  benchmark  we  trace  10K  rays  for  up  to  three  orders  of  reflections,  and  the  overall  ray  tracing 
performance  as  well  as  a  breakdown  of  the  running  time  are  reported  in  Table  4.3.  We  can  see  that  the 
ray  intersection  computation  dominates  the  total  ray  tracing  cost,  and  the  majority  of  the  intersections 
are  computed  with  Bounding  Volumes  (BVs).  The  intersection  numbers  with  surface  primitives  are  kept 
very  low  for  each  path,  which  shows  that  the  spatial  culling  is  effective  by  traversing  the  BVH.  A  detailed 
visualization  in  Fig.  4.9  demonstrates  this  point  further  that  the  intersection  cost  is  concentrated  in  the 
vicinity  of  obstacles,  whereas  for  the  majority  of  the  scenes  the  ray  intersections  are  resolved  with  relatively 
low  cost.  The  analytic  ray  tracer  achieved  close  to  interactive  performance  for  both  benchmarks,  and  it 
scales  well  with  the  complexity  in  the  media  and  the  boundaries. 


4.7  Discussions 

Our  ray  tracer  can  be  extended  in  ways  including  but  not  limited  to: 

•  coupling  with  flow  models  and  physically  realistic  media  profiles  as  inputs; 

•  augmenting  GA  methods  with  capability  to  handle  inhomogeneous  media; 

•  combining  with  other  methods  to  form  hybrid  propagation  algorithms. 

The  analytic  ray  curve  tracer  takes  general  media  profile  as  input,  as  along  as  the  profile  can  be  sampled 
for  sound  speed  and  gradient  at  any  spatial  locations.  The  input  can  be  an  analytic  function  (e.g.  Monin- 
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Obukhov  similarity  model[112,  141])  or  comes  on  a  set  of  discrete  sample  points  (e.g.  a  grid  input).  Numerical 
methods  have  been  coupled  with  a  background  flow  model  (e.g.  linearized  Euler  equations[21,  182]),  so  that 
detailed  media  conditions  output  by  the  flow  model  serves  as  input  to  propagation  simulation.  Our  ray 
tracer  is  also  compatible  with  such  a  coupling. 

GA  methods  such  as  path  tracing[136]  and  sonal  mapping[79]  model  complex  surface  interactions  includ¬ 
ing  diffuse  reflections  and  surface  BRDFs,  but  they  assume  homogeneous  media  and  straight  line  propagation 
in  between  surfaces.  Our  method  can  substitute  the  trajectory  computation  in  between  surfaces  to  account 
for  inhomogeneous  media,  while  working  seamlessly  with  the  surface  handling  part.  Such  augmented  GA 
solutions  can  be  applied  on  simulating  large  indoor  space  such  as  auditoriums  or  dense  outdoor  urban  areas. 

The  efficiency  of  our  method  makes  it  a  good  candidate  to  form  hybrid  schemes  with  wave-based  numerical 
models.  One  such  possibility  is  to  couple  the  ray  curve  tracer  with  ESM  (Equivalent  Source  Method)  with  a 
spatial  and  frequency  division,  similar  to  Yeh  et  al.  [179] .  The  ray  curve  tracer  is  focused  on  high  frequency 
and  long  range  propagation  in  the  sparse  space  in  between  complex  scatterers.  Alternatively,  our  ray  tracer 
can  be  used  for  fast  initial  evaluations  of  the  sound  field  for  a  wide  area,  and  costly  numerical  methods  such 
as  FDTD  can  be  applied  economically  to  areas  of  significant  field  complexity [128,  129,  18,  163]. 

The  analytic  ray  curve  tracer  has  several  limitations.  The  foundation  of  this  method  is  geometric  ray 
model,  inherently  a  high  frequency  approximation  as  opposed  to  a  full  wave  solution.  The  efficiency  improve¬ 
ment  from  tracing  analytic  ray  curves  depend  on  media  coherence,  and  it  will  degenerate  to  the  performance 
of  numerical  ray  integration  if  the  media  is  chaotic  without  much  spatial  coherence. 

4.8  Conclusion 

This  paper  addresses  the  challenge  of  efficient  sound  propagation  in  inhomogeneous,  moving  media  and 
large  outdoor  scenes  with  complex  boundary  surfaces.  Our  method  adapts  the  geometric  ray  models  to 
inhomogeneous  media,  and  the  key  performance  improvement  upon  rectilinear  rays  is  achieved  by  tracing 
analytic  ray  curves  based  on  local  media  gradients  as  primitives.  Segments  of  ray  curves  are  computed 
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by  sampling  the  media  gradient  on-the-fly,  accounting  for  both  inhomogeneous  and  moving  media  without 
the  need  to  pre-compute  explicit  cell  structures.  Acceleration  based  on  BVH  is  readily  adapted  to  speed 
up  surface  intersections  of  the  ray  curves,  enabling  logarithmic  scaling  with  scene  complexity  and  allowing 
dynamic  scenes.  The  improved  performance  of  this  method  is  demonstrated  in  comparison  to  rectilinear  ray 
models,  as  well  as  on  scenarios  that  are  prohibitively  expensive  with  existing  methods.  In  addition  to  being 
an  efficient  stand-alone  tool,  this  method  also  has  the  potential  of  complementing  other  GA  and  numerical 
methods. 
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Chapter  5 


Outdoor  Sound  Propagation  with 
Analytic  Ray  Curve  Tracer  and 
Gaussian  Beam 

5.1  Overview 

Outdoor  sound  propagation  benefits  from  algorithms  that  are  computationally  efficient  in  handling  inhomo¬ 
geneous  media,  complex  boundary  surfaces,  and  large  spatial  expanse.  One  recent  work  [101]  proposed  a  ray 
tracing  method  using  analytic  ray  curves  as  tracing  primitives,  which  retains  the  efficiency  of  the  ray  model 
in  handling  surface  interactions,  while  overcoming  the  performance  penalty  incurred  by  inhomogeneous  me¬ 
dia  on  rectilinear  rays.  In  this  paper,  we  develop  an  algorithm  for  sound  field  computation  of  outdoor 
propagation  that  combines  this  fast  ray  tracer  and  the  Gaussian  beam  model.  Validation  of  the  algorithm 
is  performed  by  comparison  to  published  results  on  benchmarks  in  atmospheric  and  ocean  acoustics.  The 
application  of  this  algorithm  is  demonstrated  on  a  scene  with  terrains  and  buildings  of  realistic  complexity 
and  a  set  of  media  profiles.  Our  algorithm  is  able  to  compute  characteristic  sound  field  for  fully  general 
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media  profiles  and  complex  3D  scenes  at  close-to-interactive  speed. 


5.2  Introduction 

Sound  propagation  in  outdoor  environment [133,  11,  73],  including  atmospheric  and  underwater  acoustics, 
deals  with  spatially  varying  as  well  as  moving  media.  Empirical  models,  real-world  measurements,  and  com¬ 
putational  flow  simulation  can  all  supply  the  initial  media  profiles  as  input  to  sound  propagation.  Obstacles’ 
shape  and  material  properties  play  an  important  role  as  well,  especially  for  scenes  with  complex  terrains  or 
area  with  dense  man-made  structures.  As  data  that  describe  the  media  conditions  and  the  scene  obstacles 
become  increasingly  available  with  ever  richer  details,  methods  for  outdoor  sound  propagation  needs  to  be 
able  to  account  for  the  full  scale  of  those  data. 

Existing  methods  face  many  challenges  in  handling  such  complexity,  they  either  make  assumptions 
that  preclude  a  fully  general  scene,  or  they  become  prohibitively  expensive  with  large  and  complex  scenes. 
Models  such  as  Fast  Field  Program  (FFP),  Parabolic  Equation  (PE),  and  normal  modes  [133,  11,  73]  fall 
in  the  former  category,  while  general  numerical  methods  like  Finite  Difference  Time  Domain  (FDTD)  [24], 
Finite  Elements  Method  (FEM)[153],  and  Boundary  Element  Method  (BEM)[26]  belong  to  the  latter.  For 
outdoor  settings  many  methods  are  focused  on  modeling  the  media  complexity,  while  obstacles  are  generally 
assumed  to  be  sparse  and  their  complexity  limited  in  the  investigations. 

Geometric  acoustics  (GA)[115]  methods  like  ray  models  are  known  for  their  efficiency  in  handling 
boundary  surfaces  under  the  assumption  of  homogeneous  media  and  rectilinear  paths.  A  recent  work[101] 
has  also  attempted  to  accommodate  inhomogeneous  media  by  tracing  parabolic  ray  curves  as  primitives, 
which  significantly  accelerates  path  computation.  On  the  other  hand  ray  models  suffer  from  artifacts  in 
caustics  and  shadow  zones  when  computing  fields,  and  models  such  as  Gaussian  beam[161]  performs  better 
in  this  regard.  However,  the  performance  of  Gaussian  beam  can  be  hindered  by  the  underlying  numerical 
ray  integration  that  remains  slow  for  inhomogeneous  media. 

We  combines  the  performance  of  the  ray  curve  tracer[101]  and  the  accuracy  of  the  Gaussian  beam[161] 
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into  an  algorithm  for  outdoor  sound  propagation.  In  particular: 


1.  We  compute  analytic  solutions  to  on-ray  pressure  as  well  as  paraxial  fields  based  on  the  parabolic  ray 
formulation  (Sec.  Ill),  which  leads  to  efficient  field  computation  that  matches  the  efficiency  of  the  path 
computation. 

2.  We  combine  Gaussian  beam  with  the  analytic  ray  formulation,  and  we  validate  the  approach  on  simple 
benchmarks  by  comparing  to  published  results  generated  by  established  techniques  in  both  atmospheric 
and  oceanic  settings  (Sec.  IV). 

3.  We  apply  the  algorithm  on  a  complex  outdoor  scene  with  realistic  configurations,  which  demonstrate 
its  efficiency  in  generating  characteristic  sound  field  (Sec.  V). 

Overall  we  provide  a  validated  solution  to  outdoor  sound  propagation  that  augments  a  fast  analytic 
ray  tracer  with  equally  fast  analytic  field  computations.  This  algorithm  takes  general  media  and  scene  input 
and  computes  the  full  3D  sound  field  at  close-to-interactive  speed,  which  can  be  useful  for  a  wide  range  of 
outdoor  sound  applications  (Sec.  VI5.7). 

5.3  Prior  Work 

Outdoor  sound  propagation  has  been  studied  extensively  in  the  context  of  underwater[73]  and  atmospheric 
acoustics [133].  Here  we  first  review  numerical  methods  that  provide  full  wave  solutions,  including  various 
hybrid  schemes  aiming  at  reducing  the  high  computation  cost  of  those  wave-based  methods.  Secondly  ray- 
based  methods  are  reviewed  with  their  advantages  and  limitations  for  outdoor  scenarios. 

5.3.1  Wave-based  Methods 

Early  methods  include  Fast  Field  Program  (FFP)[39]  and  Parabolic  Equation  (PE)  [102,  50],  both  provide 
frequency  domain  full  wave  solutions  that  account  for  the  inhomogeneous  media,  but  with  simplifying  as¬ 
sumptions.  Given  scenarios  that  meet  those  assumptions,  these  early  models  have  been  thoroughly  validated 
[65,  180,  127,  142,  14]  and  often  serve  as  reference  solutions  to  test  other  models. 
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Among  the  more  general  numerical  methods  that  handle  arbitrary  media  and  obstacles,  the  Finite 
Difference  Time  Domain  (FDTD)  method  is  widely  used  that  solves  the  linearized  Euler  equation[110,  175]. 
FDTD  have  been  coupled  with  background  flow  simulation [21,  182],  applied  with  turbulence [176,  34],  various 
ground  conditions [36],  terrains[61,  37,  41],  and  complex  obstacles[109,  3,  42].  As  a  time  domain  method  it 
is  also  used  for  pulse  propagation[90,  3].  The  disadvantage  of  FDTD  is  mainly  its  limited  scalability  with 
the  domain  volume  or  the  frequency,  making  it  prohibitively  expensive  for  large-scale  broadband  simulation. 
FDTD  has  been  limited  to  low  frequency  for  it  to  be  practical  for  wide  area  assessment  [60] .  Other  frequency 
domain  methods  such  as  FEM[103,  164]  face  similar  challenges  in  scaling  with  volume  and  frequency. 

Hybrid  methods  were  developed  that  use  FDTD  in  a  confined  area  and  apply  PE  to  propagate  over 
long  range  with  relatively  sparse  space[128,  129,  18].  Alternatively,  methods  such  as  BEM[26]  or  Equivalent 
Source  Method  (ESM)[105]  were  employed  to  limit  the  computation  either  to  boundary  surfaces,  or  to 
volumes  that  bound  scatterers  tightly.  However,  BEM  needs  to  be  coupled  with  specialized  Green  function 
for  refractive  media[119,  177],  and  it  scales  poorly  with  surface  area  and  frequency.  ESM  was  coupled  with 
ray  models[179]  to  handle  large  domains,  but  this  hybrid  method  does  not  scale  well  with  the  number  or 
complexity  of  scatterer  objects.  Methods  such  as  Pseudo  Spectral  Time  Domain[67,  66]  and  Transmission 
Line  Matrix  (TLM)[64,  52,  13]  are  more  efficient,  but  they  are  still  fundamentally  limited  by  the  cost  of 
discretizing  a  large  domain.  A  more  recent  method,  Adaptive  Rectangular  Decomposition  (ARD)[123,  99], 
took  advantage  of  the  analytic  solution  of  wave  equation  in  a  rectangular  domain,  but  it  requires  constant 
sound  speed  within  each  spatial  subdivision,  which  is  not  easily  adapted  to  a  general  media  profile. 

5.3.2  Geometric  Acoustics  Methods 

Geometric  acoustics  (GA)  methods [115]  are  widely  used  in  room  acoustics[85],  to  handle  the  high  order 
surface  interactions  under  the  valid  assumption  of  a  homogeneous  media.  Examples  include  the  image  source 
method[7,  22],  ray  tracing[167,  149],  frustum  tracing[148],  beam  tracing[46,  32],  sonal  mapping[19,  38,  79], 
path  tracing[136],  and  radiosity[160]. 

The  ray  models  have  also  been  applied  to  inhomogeneous  media[28,  73]  by  numerically  integrating  the 
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Figure  5.1:  Algorithm  overview. 


ray  equations.  A  sparse  set  of  rays  can  be  efficiently  traced  to  plot  out  the  sound  propagation  paths,  but 
long  range  propagation  in  this  manner  becomes  expensive.  When  the  ray  models  are  used  to  compute  the 
pressure  field  in  addition  to  the  propagation  paths,  they  are  known  to  have  issues  in  the  caustic  zones  and 
in  the  shadow  zones.  With  some  ray  models  another  source  of  inefficiency  in  inhomogeneous  media  is  the 
need  to  locate  eigen  rays.  The  Gaussian  beam  approach[161]  was  developed  in  seismology  and  applied  on 
underwater[16]  and  atmospheric [48]  acoustics,  which  improves  the  accuracy  in  caustics  and  shadow  zones 
and  eliminates  the  need  to  locate  eigen  rays.  However,  when  the  underlying  path  still  consists  of  rectilinear 
ray  segments  The  performance  of  the  Gaussian  beam  method  is  limited  by  the  numerical  integration  step 
sizes.  A  recent  work[101]  achieved  significant  performance  improvement  by  replacing  the  rectilinear  ray  with 
parabolic  ray  curve.  Adaptive  segment  sizes  based  on  on-the-fly  media  sampling  as  well  as  acceleration 
structure  that  bounds  surfaces  and  ray  curves  leads  to  further  speedup.  We  give  an  overview  of  this  ray 
tracer  in  the  next  section,  before  we  introduce  our  algorithm  that  combines  this  ray  tracer  with  the  Gaussian 
beam. 
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5.4  Algorithm 


An  overview  of  our  outdoor  sound  propagation  algorithm  is  illustrated  in  Figure  5.1.  We  built  upon  a 
very  efficient  ray  tracer  that  computes  propagation  paths  made  up  of  segments  of  analytic  curves  (Sec. 
A5.4.1).  Given  the  output  from  the  ray  tracer,  we  compute  a  set  of  additional  variables  per  ray  segment 
that  are  subsequently  used  for  pressure  computation  both  on  the  ray  paths  and  in  the  paraxial  regions. 
Those  variables  are  computed  by  analytic  evaluations  of  constant  cost,  which  extends  the  efficiency  of  paths 
computation  to  pressure  computation.  The  algorithm  for  on-ray  pressures  and  ray  paraxial  fields  are  given 
in  Sec.  5.4.2  and  5.4.3,  respectively.  The  mathematical  derivation  that  leads  to  this  algorithm  is  a  special 
case  of  the  more  general  ray  theory  framework  discussed  in  depth  in  Cerveny’s  comprehensive  work[161]. 
They  can  be  found  in  the  Appendix  to  this  paper  for  completeness  of  the  presentation. 

5.4.1  Analytic  Ray  Curve  Tracer 

Given  a  media  profile  with  spatially  varying  sound  speed  V(xi),  the  slowness  vector  is  the  spatial  derivative 
of  the  travel  time  held  T,  p  =  VT.  Assuming  a  constant  gradient  of  V~2:  V(x)~2  =  A$  +  A  ■  x  within  a 
certain  range,  the  trajectory  is  a  parabolic  curve  that  lies  in  the  plane  with  the  normal  of  A  x  po  (the  ray 
plane,  see  Figure  5.2).  The  intersection  between  the  parabolic  ray  curve  and  planar  surfaces  can  be  solved 
analytically,  and  key  properties  such  as  position  x,  tangent  direction  t  which  coincides  the  direction  of  p,  p 
and  T  can  be  computed  for  any  point  along  the  ray  by  analytic  evaluations  of  constant  cost,  as  long  as  those 
properties  are  known  at  one  point  along  the  ray  (e.g.  the  source). 

This  analytic  ray  formulation  enables  a  ray  tracer  that  computes  propagation  paths  in  a  general  media 
consisting  of  consecutive  segments  of  parabolic  curves.  The  ray  properties  are  computed  only  at  the  end 
points  and  are  continued  across  segments.  In  spatially  coherent  media,  the  assumption  of  constant  W~2 
generally  holds  for  a  range  larger  than  the  assumption  of  constant  V,  enabling  the  ray  curve  tracer  to  advance 
in  longer  segments  than  rectilinear  ray  tracer,  which  is  one  of  the  key  sources  of  performance  improvements. 
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(a)  (c) 

Figure  5.2:  (a)  the  analytic  ray  curve  in  ray-plane.  Rays  of  two  different  Oq  are  drawn  in  blue  and  magenta, 
respectively.  (x/,Zf)  is  the  vertex  of  the  parabola.  Assuming  a  locally  constant  W-2,  of  which  the  range 
of  validity  determines  the  extent  of  the  ray  segment.  The  two  red  dots  mark  the  end  points  of  ray  segments, 
where  the  rays  exit  from  this  sphere  of  validity  (Sec.  A).  (b,c)  analytic  evolution  of  P,  Q  are  performed 
for  each  segment  by  transforming  into  and  evolving  in  Cartesian  coordinates  before  transforming  back  to 
ray-centered  coordinates  (Sec.  B). 

5.4.2  Dynamic  Ray  Tracing 

Dynamic  ray  tracing  keeps  track  of  how  a  set  of  derivatives  in  ray-centered  coordinates  progress  among 
propagating  rays,  and  the  derivatives  subsequently  are  used  in  computation  of  pressure  and  travel  time  along 
the  ray.  For  the  parabolic  ray  curve,  the  derivatives  in  Cartesian  coordinates  can  be  evolved  analytically 
between  any  two  points  within  a  ray  curve,  and  the  transformation  between  Cartesian  coordinates  and  ray- 
centered  coordinates  can  also  be  evolved  analytically,  thereby  we  achieve  analytic  evolution  of  the  derivatives 
in  ray-centered  coordinates  (See  Fig.  5.2(b,c)). 

First  we  define  the  coordinates  involved.  At  any  point  along  a  particular  ray  D,  the  ray-centered, 
coordinates  q±,  q2,  93  is  defined  with  origin  at  that  point.  Ray  Q  is  the  93-axis  of  the  system,  the  91-axis 
and  92-axis  are  taken  to  be  perpendicular  to  93-axis,  and  mutually  perpendicular.  The  unit  basis  of  the 
ray-centered  coordinates  are  denoted  by  el,  el,  ej.  The  3x3  transformation  matrix  from  the  ray-centered 
coordinates  9 /.  to  the  Cartesian  coordinates  x*  is  denoted  by  H,  Hu-  =  dxi/dqk  =  dq^/Oxi  =  e^i,  i,k  =  1,  2, 3. 
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The  transformation  matrix  from  the  Cartesian  coordinates  to  ray-centered  coordinates  is  thus  H -1  =  HT . 

The  derivatives  we  seek  in  dynamic  ray  tracing  captures  how  a  set  of  rays’  spatial  relationship  with 
each  other  changes  as  they  travel  through  a  media  profile.  Consider  an  orthonormal  system  of  rays  starting 
from  a  sound  source,  parameterized  by  two  ray  parameters  71,  72,  taken  here  as  the  take-off  azimuth  (f> 0  and 
elevation  *o  angles.  The  two  2x2  matrices  of  derivatives  in  the  ray-centered  coordinates  Q  and  P  are  defined 
with  elements  Qu  =  (dqi / d^j)T=ConSt,  Pij  =  {dp\q) /d^j)T=ConSt,  I,J  =  1,2.  Correspondingly,  Q(x)  and 
P(x)  in  Cartesian  coordinates  are  defined  with  elements  Q^j  =  (dxi/d'yj)<T=const,  =  (dp\x^ /d"fj)a=const, 
i  =  1, 2, 3,  J  =  1, 2.  We  have  Q(x)  =  HQ,  P(x)  =  HP. 

Under  the  assumption  of  constant  VV~2,  the  P,  Q  are  evolved  between  two  points  within  the  same  ray 
curve  segment  by  analytically  evolving  Q(x),  and  H ,  in  the  steps  shown  in  Fig.  5.2(b,c).  J  =  detQ  is 
the  ray  Jacobian,  used  in  the  geometric  spreading  term  |  J  | 1//2 ,  which  in  turn  relates  the  pressure  amplitude 
at  point  s  to  the  pressure  amplitude  at  the  source  So  by: 


Pray(s) 


p{s)V(s)J(s0) 

[p(s0)V(s0)J(s) 


}1/2P(s  0), 


(5.1) 


where  p  is  the  density  of  the  media.  Both  P  and  Q  in  ray-centered  coordinates  are  also  used  in  computing 
the  paraxial  field,  in  Sec.  C. 


5.4.3  Field  Computation  with  Gaussian  Beam 

Given  the  dynamic  ray  tracing  results  for  points  along  the  ray  paths,  the  Gaussian  beam  provides  a  model 
to  approximate  paraxial  fields  in  the  vicinity  of  the  ray  paths,  which  involves  computing  both  paraxial  travel 
time  and  pressure  amplitude. 

Paraxial  Travel  Time 

Looking  at  the  definition  of  P  and  Q  which  we  compute  analytically  by  dynamic  ray  tracing,  we  introduce 
the  2x2  matrix 
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M  =  PCT\  Mu  =  (dp^  /dqj)T=const, 


I,  J=  1,2. 


(5.2) 


Recall  the  slowness  vector  p  is  the  first  derivative  of  T,  M  is  therefore  the  second  derivative  of  T  with  respect 
to  ray-centered  coordinates.  For  a  point  R'  in  the  vicinity  of  a  ray  f l,  the  paraxial  travel  time  at  R'  can  be 
computed  given  the  T  at  a  point  R  on  O: 

T(R',  R)  =  T(R)  +  ^qT(R')M(R)q(R'),  q  =  («i,  92)T,  (5.3) 

when  VR  is  the  plane  perpendicular  to  Q  that  passes  R' ,  and  point  R  is  the  intersection  of  the  ray  f 1  and 
O^.  The  derivatives  of  T  can  also  be  approximated  in  Cartesian  coordinates,  in  which  case  any  point  i?7  on 
the  ray  that  is  close  to  R!  can  be  selected,  and  computing  ll1-  and  R  are  not  necessary  (see  Appendix  for 
details). 

Paraxial  Pressure  Amplitudes 

There  are  multiple  ways  compute  paraxial  pressure  amplitudes  and  the  Gaussian  beam  model  is  one  of  them. 
It  computes  a  paraxial  amplitude  centered  on  the  ray  with  a  Gaussian  drop-off,  which  is  achieved  by  allowing 
the  matrix  M  to  be  complex:  M  =  Re(M)  +  Im(M).  Im(M)  is  chosen  to  be  positive  definite,  so  that 

pbeam(R')  =  Prav (R)exp[—iu}(—T(R)  -  iqT(R,)M(f?)q(i?'))]  (5-4) 

=  Prav(R)exp[-iuj(-T(R)  -  ^qT (R')Re(M(R))q(R’))}  (5.5) 

x  exp[—  ^wqT(i?,)Im(M(f?))q(f?')].  (5.6) 

Matrices  with  suffix  a  (M°,  P°,  Qa)  represents  the  matrices  of  the  actual  field  (Eq.  5.2). 

The  contributions  of  Gaussian  beams  are  then  summed  up  by  integral  superposition.  As  shown  in  the 
rightmost  block  of  Fig.  5.1,  we  gather  all  segments  of  ray  curves  that  pass  in  the  vicinity  of  a  given  field 
point,  and  we  compute  the  paraxial  pressure  amplitude  and  travel  time  from  each  segment  and  sum  those 
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up.  For  field  computation  that  involves  large  volumes  of  points,  we  spread  some  of  the  costs  of  locating 
vicinity  ray  segments  by  reversing  the  process  and  distribute  pressure  contribution  from  each  beam  to  the 
field  points  it  covers. 


5.5  Validation 


We  validated  our  algorithm  on  two  benchmark  scenes  of  atmospheric  and  oceanic  sound  propagation.  The 
benchmarks  are  selected  from  literature  with  published  results  computed  by  alternative  methods.  Simple 
scenarios  are  modeled  in  those  benchmarks,  of  which  the  sound  field  characteristics  can  be  easily  inter¬ 
preted,  and  for  which  validated  results  computed  by  multiple  methods  are  available  for  direct  compar¬ 
isons.  After  the  validity  of  our  method  is  established  with  these  benchmarks,  application  on  a  real-world 
oriented  scene  with  a  more  complex  media  conditions  is  presented  in  Section  V.  We  computes  and  vi¬ 
sualize  the  sound  field  in  terms  of  the  transmission  loss  (TL)  for  all  benchmarks,  which  is  defined  as: 

rjn 2Q  Iq  ( Pressure  at  a  field  point) 

y  ( Pressure  of  free  field  at  1  m  from  source )  * 


5.5.1  Benchmark  A 

Inhomogeneous  atmosphere,  flat  ground  with  impedance 

In  Attenborough  et  al.[12],  a  set  of  benchmark  cases  for  outdoor  sound  propagation  is  proposed,  and 
results  generated  by  a  range  of  methods  show  good  agreements,  including  FFP,  PE,  normal  modes,  ray 
and  beam  tracing.  The  boundary  surface  in  the  scene  is  a  flat  ground  with  impedance,  while  the  media  is 
inhomogeneous  with  three  different  profiles. 

Media  profile:  sound  speed  c(x)  at  spatial  location  x  with  height  z[x)  is  given  by 
case  1:  downward  refractive  c(x )  =  343  +  0.1  *  z(x)(m/s); 
case  2:  upward  refractive  c(x)  =  343  —  0.1  *  z{x){m/s)\ 

case  3:  duct  condition:  case  1  for  z(x)  <  100m,  case  2  for  100m  <  z(x)  <  300?n, 
and  constant  c  for  z(x)  >  300 m. 
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Case  1 


Case  2 


Case  3 


Range  (m) 


Figure  5.3:  Benchmark  A  Range-TL  plot:  Source  height  hs  =  5m,  receiver  height  hr  =  1  m,  range  10 km.. 
Columns  contain  results  for  media  profiles  case  1,  2,  and  3,  while  rows  contain  results  for  frequency  10,  100, 
and  1000  Hz.  Comparison  with  Fig.  12-14[12]. 


Range  (km)  Range (km) 


Figure  5.4:  Benchmark  A  2D  field:  Source  height  hs  =  5 to,  receiver  height  hr  =  1  m,  frequency  10Hz. 
2D  vertical  field  of  height  up  to  1  km  and  range  up  to  lOfcm  is  visualized  on  the  left,  and  the  corresponding 
contour  plot  is  shown  on  the  right  for  comparison  with  Fig.  15  [12] . 
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Ground  impedance:  A  four  parameter  model  is  used  to  compute  the  impedance  of  the  flat  ground, 


and  the  same  parameters  from  Attenborough  et  al.  [12]  are  used. 

Results:  As  shown  in  Fig.  5.3  and  5.4,  our  algorithm  is  able  to  replicate  the  interference  pattern  for 
this  set  of  media  conditions  at  three  different  frequencies.  ID  range  plot  and  2D  vertical  field  of  resulting 
TL  is  included  for  direct  comparison  with  figures  in  [12].  The  number  of  rays  required  is  as  low  as  21  rays 
to  achieve  the  results,  more  rays  can  be  traced  to  compute  asymptotically  more  accurate  pressure  fields. 

5.5.2  Benchmark  B 

Munk  profile  with  conical  seamount 

We  validate  our  algorithm  on  an  underwater  benchmark  with  the  standard  Munk  profile  and  a  conical 
seamount  as  bathymetry.  Published  results  computed  by  normal  modes  can  be  found  in  prior  work  [93]. 
Media  profile:  Munk  profile  with  depth  0  —  5000m. 

Bathymetry:  conical  seamount  located  lOOfcm  from  the  source,  radius  of  the  base  20km,  and  two 
different  heights  1000?n  and  3800?n. 

Bottom  impedance:  fluid  half  space  with  compressional  speed  of  2000m/s,  density  of  1  g/cm3,  and 
attenuation  of  O.ldB/X. 

Results:  The  vertical  2D  TL  field  is  visualized  for  a  range  up  to  200km  and  a  depth  up  5km  in  Fig. 
5.5.  Our  ray  tracer  successfully  replicate  the  characteristics  of  the  field  for  this  underwater  benchmark  with 
extensive  volume,  as  compared  with  the  normal  mode  results  available  in  prior  work[93]. 


5.6  Application  on  General  Outdoor  Scene 

5.6.1  Scene  configuration 

We  generate  a  general  media  profile  based  on  an  empirical  model  of  the  atmosphere  [133].  The  acoustic 
index  of  refraction  in  the  atmosphere  n  =  cq/c ,  where  Co  is  the  reference  sound  speed,  is  modeled  with 
a  stratified  component  nstr  and  a  fluctuation  component  njiu,  so  that  n  =  nstr  +  nfiu ■  The  stratified 
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Range  (km)  x>o‘ 


Figure  5.5:  Benchmark  B  2D  field:  Source  depth  1000m,  frequency  50Hz.  2D  vertical  field  of  TL  is 
visualized  for  depth  up  to  5 km  and  range  up  to  200 km,  compared  to  Fig.  9 [93]. 


component  follows  a  logarithmic  profile  of  the  altitude  z:  nstr{z )  =  Co/(co  +  5  In  (j-  +  lj),  the  parameter 
zg  is  the  roughness  length  of  the  ground  surface,  and  typical  value  for  b  are  1  m/s  for  a  downward-refracting 
and  —1  m/s  for  upward-refracting  atmosphere.  The  fluctuation  component  at  position  x  can  be  computed 
as  n/z„(x)  =  G(ki)  cos(k,  •  x  +  tpi),  where  k,  is  the  wave  vector  describing  the  spatial  frequency  of  the 
fluctuation,  ipi  is  a  random  angle  between  [0,  27r],  and  G(ki)  is  a  normalization  factor. 

We  use  an  artificial  scene  consisting  of  undulating  terrains  that  depicts  a  reservoir  and  buildings.  A 
wireframe  rendering  of  the  scene,  and  the  two  sound  source  locations  marked  by  green  and  red  dots  can  be 
found  in  Fig.  5.6(a).  The  scene  has  a  physical  dimension  of  220m  x  150m  x  50m,  and  is  represented  by  4, 000 
triangular  surface  primitives.  Our  algorithm  can  simulate  propagation  for  any  scenes  that  can  be  modeled 
or  scanned  into  similar  surface  representations  as  demonstrated  here. 


5.6.2  Results 

The  diurnal  changes  in  the  atmosphere  typically  lead  to  an  upward  refractive  condition  during  the  day,  and 
a  downward  refractive  condition  at  night.  We  generate  the  sound  field  for  the  source  on  the  slope  of  the 
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Figure  5.6:  Outdoor  scene  and  field  results  for  two  source  locations:  (a)  Wireframe  rendering  of  the  Reservoir 
scene,  green  dot  represents  a  sound  source  located  on  the  slope,  and  the  red  dot  represents  a  sound  source  in  the 
valley,  (b)  Slices  of  sound  TL  field  visualized  for  the  green  source,  (c)  Slices  of  sound  TL  level  visualization  for  the 
red  source.  Frequency  10Hz. 
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Figure  5.7:  Vector  wind  for  source  in  the  valley:  (a)  Difference  in  horizontal  field  of  sound  pressure 
level  between  a  north  and  a  sound  wind,  (b)  Difference  in  horizontal  field  of  sound  pressure  level  between 
an  east  and  a  west  wind,  (c)  Difference  in  vertical  field  of  sound  pressure  level  between  an  east  and  a  west 
wind.  Frequency  10Hz. 
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Figure  5.8:  Source  on  the  slope:  (a)  Relative  sound  pressure  level  upward  vs  downward  refraction,  vertical  field, 
(b)  horizontal  field,  (c)  Relative  sound  pressure  level  up  wind  vs  down  wind  propagation,  vertical  field,  (d)  horizontal 
field.  Frequency  10Hz. 


reservoir,  and  a  source  in  the  center  of  the  valley,  for  upward  and  downward  refractive  conditions  respectively, 
and  visualize  the  difference  (Fig.  5.8(a,b)  and  Fig.  5.9). 

Wind  plays  an  important  role  in  atmospheric  sound  propagation,  creating  extra  variations  in  the  sound 
speed  profile,  and  interacts  with  physical  obstacles  that  further  complicates  the  sound  field.  For  the  sound 
source  on  the  slope,  we  also  simulate  the  sound  held  for  up-wind  and  down-wind  conditions  (Fig.  5.8(c,d)), 
which  yields  similar  patterns  as  the  relative  difference  between  upward  and  downward  refractive  media. 

With  our  ray  tracer  we  also  account  for  moving  medium  with  vector  wind  held,  and  we  show  this 


Figure  5.9:  Source  in  the  valley:  Relative  sound  pressure  level  upward  vs  downward  refraction,  (a) 
horizontal  and  (b)  vertical  held  that  passes  the  source  location.  Frequency  10Hz. 
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capability  with  the  sound  source  in  the  valley,  and  visualization  of  the  difference  in  sound  field  between  a 
west  and  a  east  wind,  and  between  a  north  and  a  south  wind,  respectively  (Fig.  5.7). 

For  the  benchmark  scene  tested  here  the  sound  field  displays  complex  characteristics  resulting  from  the 
interaction  of  sound  wave  with  the  media  and  the  obstacles.  Our  propagation  algorithm  can  produce  this 
complex  sound  field  close  to  interactively,  so  that  observations  can  be  made  while  varying  scene  geometry, 
media  profiles,  or  frequency.  We  show  a  stack  of  slices  of  the  sound  field  in  Fig.  5.6  for  visualization  purposes, 
while  the  full  3D  sound  field  is  computed  by  our  algorithm.  Details  on  performance  of  our  method  is  can  be 
found  by  referring  to  [101]. 


5.7  Discussion 

This  algorithm  is  complementary  to  many  existing  sound  propagation  techniques  and  can  be  extended  or 
combined  in  multiple  ways.  With  the  analytic  ray  curve  tracer[101]  as  a  component,  our  method  inherits 
its  potential  extensions,  including  augmenting  GA  methods,  hybrid  method  based  on  frequency  and  spatial 
subdivision  similar  to  Yeh  et  al.  [179] ,  and  fast  wide  area  assessment  using  the  ray  tracer  with  concentrate 
numerical  methods  in  areas  of  interests.  Furthermore,  extensions  [89,  121]  can  be  built  similar  to  other  ray 
models  to  account  for  turbulence.  It  is  also  possible  to  accommodate  sound  sources  other  than  a  point 
source,  using  the  techniques  of  Gaussian  beam  expansion  of  complex  or  sources  with  directivity[134,  81,  59]. 

As  a  ray  model  and  Gaussian  beam  model,  this  algorithm  inherits  the  limitations  being  a  high  frequency 
approximation,  not  a  full  wave  solution.  The  analytic  ray  tracer  relies  on  existence  of  spatial  coherence  in 
the  media  to  perform  efficiently.  The  Gaussian  beam  model  that  is  used  to  compute  the  sound  field  relies  on 
carefully  chosen  parameters  that  control  the  beam  width[16],  and  it  is  best  determined  on  a  per-scene  basis. 

5.8  Conclusion 

This  paper  combines  the  analytic  ray  curve  tracer  and  Gaussian  beam  model  to  form  an  efficient  solution 
for  outdoor  sound  field  computation.  Based  on  the  parabolic  ray  formulation[101],  analytic  solutions  that 
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computes  on-ray  pressure  as  well  as  paraxial  field  are  used  with  a  Gaussian  beam  model.  The  efficiency  of 
the  paths  computation  by  the  analytic  ray  tracer  is  matched  with  the  efficiency  of  the  pressure  computation, 
and  the  combined  algorithm  can  efficiently  simulate  the  propagated  sound  field  for  large  outdoor  scenes  with 
general  input  media  and  arbitrary  obstacles.  This  algorithm  is  validated  on  benchmarks  with  published 
results  generated  by  validated  methods  including  FDTD,  PE,  and  normal  modes.  Simple  benchmarks  are 
chosen  to  facilitate  the  verification  of  the  results,  but  they  cover  scenarios  that  span  inhomogeneous  media 
and  obstacles  with  atmospheric  and  underwater  propagation.  The  capability  of  this  algorithm  is  further 
demonstrated  with  a  complex  artificial  scene  and  a  variety  of  media  conditions  that  would  present  scalability 
challenges  to  existing  methods.  Results  that  reflect  the  characteristics  of  the  scene  and  media  are  generated  at 
close-to-interactive  speed.  As  future  work  we  hope  to  fully  extend  and  apply  this  algorithm  on  more  outdoor 
scenarios,  and  to  obtain  measured  data  or  to  run  large  scale  numerical  simulation  for  further  validation  of 
the  algorithm. 

5.9  Derivations 

5.9.1  Analytic  evolution  of  ray  trajectories 

With  ray  parameter  a  defined  by  dcr  =  V2d T  =  Eds,  the  ray  trajectories  Xi,  slowness  pi,  and  travel  time  T 
can  be  evolved  analytically  from  sigma o  to  any  sigma  along  the  ray: 

xt(cr)  =  xi0  +  Pio(cr  -  a0)  +  ^A:(o-  -  (Jo)2,  (5.7) 

Pi(a)  =  p.l0  +  ^A(cr  -  cr0),  (5.8) 

T(a)  =  T(a0)  +  V0 ~2(cx  -  cr0)  +  ^Aipi0(a  -  a0)2  +  ■^AiAi(a  -  a0)3.  (5.9) 
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5.9.2  Analytic  evolution  of  Cartesian  P  and  Q 


The  characteristic  system  of  the  Hamiltonian  for  of  the  Eikonal  equation  gives: 


d Pi  _  1  d  1 
clcr  “  2dXiV2’' 


d  T 
dcr 


=  PkPk  =  V  2 


(5.10) 


From  equation  5.10  and  because  partial  derivative  d/dj  commutes  with  A/ da,  a  simple  dynamic  ray 
tracing  system  can  be  derived  as  follows: 


d_ 

da 


_lpW  =  1  d"  ,  1 

da  '  2  dxidxj  V'-'  ’  ■'<  ’ 


(5.11) 


For  constant  media  gradient  of  V  2,  5.11  can  be  solved  analytically  for  any  point  R  at  a  along  the  ray  0  if 
Q1"'1!  and  are  known  at  any  other  point  S  at  CTq  along  the  ray  fl: 


P 


=  p(x) 


/Q\ 


'll*)  / 


5.9.3  Analytic  evolution  of  tranformation  matrix 

For  constant  gradient  of  V~2,  H  can  be  solved  analytically  for  any  point  R  from  any  other  point  S  along 
f2.  This  is  achieved  by  computing  the  ray-centered  coordinates  unit  basis  el,  el,  and  el  that  constitutes 
H.  Consider  a  set  of  orthonormal  unit  vectors  rn,  ri2,  res  defined  along  ray  parameterized  by  a.  Let 
nl(a)  =  V(a)p(a)  follow  the  tangent  of  the  ray,  712(a)  is  selected  to  be  perpendicular  to  the  ray  plane,  n\  is 
then  defined  by  n\  =  li 2  x  n%.  Because  the  ray  is  a  planar  curve  for  constant  gradient  V~2,  n\(a)  =  rii(ao). 
Given  Equations  5. 7-5. 9, 


712(a)  =  n\(a)  x  ii3(a)  =  rii(er)  x  V(a)p(a) 

=  ri[(a0)  x  V(a)(p(a0)  +  ^A(a-a0)), 


(5.13) 
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As  e3  coincides  with  n 3,  el  (a),  e^cr)  can  be  determined  from  el(cro),  e2(cro)  and  the  evolution  of  n\,  n2  from 


(To  to  a  is: 


el (<r)  =  [el(cr0)  •  rTi(cro)]nl(CT)  +  [el(cr0)  •  r^(<70)]r^(c7), 

el(cr)  =  [el(cr0)  •  rTi(cr0)]nl(CT)  +  [e2(a0)  ■  n2{a0)]n2{a).  (5.14) 


5.9.4  Evolution  of  ray-centered  P  and  Q 


1.  Take  initial  condition  for  P,  Q.  Assuming  a  point  source  S  and  </> 0  and  as  the  ray  parameters  71, 

(1  0) 


72:  Q (5)  =  0,  P (S)  = 


V(S) 


y0  sinio  J 

2.  Transform  P,  Q  into  P*^  and  with  H, 

3.  Analytically  evolve  P(x)  and  QA)  by  Equation  5.12, 

4.  Analytically  evolve  H  by  Equation  5.14, 

5.  Transform  the  evolved  P ^  and  back  to  P,  Q  with  the  evolved  HT . 


5.9.5  Cartesian  paraxial  travel  time 

Denote  the  Cartesian  coordinates  of  R'  and  R1  by  Xi(R')  and  x*(i?7),  and  Xi(R,R7)  =  Xi(R)  —  Xi(R^),  the 
quadratic  expansion  of  T  from  T(i?7)  is: 


T(R,  i?7)  =  T(Ry)  +  Xi(R,  R1)p[x\R-y)  +  -Xi(R,  R-y)xj(R,  Rj)M-x> (iZ7) 


1 


(5.15) 


where  Mi:j  are  the  elements  of  the  3x3  matrix  MW; 


M  (i?7)  =  H(i?7 


M(l?7) 


Ml3(Ay) 

M2z(Rry) 

\m13{R7)  M23(R-y)  ] 


H  t(U7). 


(5.16) 
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Here  M(i?7)  is  defined  in  Eq.  5.2.  The  remaining  elements  can  be  derived  [161]  to  be: 


M13(Ry)  =  -(v~2v1)R^  M23(RJ)  =  -(v-\2)Ry,  M33(R^)  = -(v~2v,  3)Ry,  (5.17) 

v  =  [V(qi,q2,  s)\qi=q2=0tS=siR^,  vti  =  [dV(q1,q2,s)/dqi)qi=g2=0tS=s{R^.  (5.18) 

vti  can  be  solved  by  transforming  to  Cartesian  coordinates  first:  vq,  =  dV/dqi  =  HkidV /  dxk ,  and  dV/dxk 
can  be  solved  analytically  for  constant  gradient  of  V~2  by: 

dv~2/dxk  =  -2V~3dv/dxk  =  Ak  =»  dv/dxk  =  -\v3Ak.  (5.19) 


5.9.6  Gaussian  beam  summation 

The  contributions  of  Gaussian  beams  are  then  summed  up  by  integral  superposition: 


p(i?,w) 


$(71  >  l2)Prav {R7)exp[iu]T(R,  i?7)]d7id72, 


(5.20) 


where  R  is  the  field  point  and  i?7  is  a  point  on  the  ray  7  of  the  ray  parameter  71,  72.  The  weighting  function 
$  is  derived  to  be: 


$(71,72)  =  (w/27r)[— det(M(i?7)  -  Ma(i?7))]1/2|detQa(i?7)|  (5.21) 

=  (w/27r)[-det(QaT(M  -  Ma)Qa)]1/2.  (5.22) 

The  choice  of  Re(M)  is  related  to  the  curvatures  of  the  wavefront  and  the  choice  of  Im(M)  is  related 
to  the  width  of  the  amplitude  profile.  They  can  be  specified  at  i?7  or  any  other  point  along  the  central  ray 
7  to  control  the  shape  of  the  beam. 
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Chapter  6 


Acoustic  pulse  propagation  in  an 
urban  environment  using  a 
three-dimensional  numerical 
simulation 

6.1  Overview 

Acoustic  pulse  propagation  in  outdoor  urban  environments  is  a  physically  complex  phenomenon  due  to  the 
predominance  of  reflection,  diffraction,  and  scattering.  This  is  especially  true  in  non-line-of-sight  cases,  where 
edge  diffraction  and  high-order  scattering  are  major  components  of  acoustic  energy  transport.  Past  work  [3] 
has  shown  that  many  of  these  effects  can  be  captured  using  a  two-dimensional  finite-difference  time-domain 
method,  which  was  compared  to  the  measured  data  recorded  in  an  army  training  village.  In  this  paper,  a 
full  three-dimensional  analysis  of  acoustic  pulse  propagation  is  presented.  This  analysis  is  enabled  by  the 
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adaptive  rectangular  decomposition  method  [123],  which  models  sound  propagation  in  the  same  scene  in 
three  dimensions.  The  simulation  is  run  at  a  much  higher  usable  bandwidth  (nearly  450  Hz),  and  took 
only  a  few  minutes  on  a  desktop  computer.  It  is  shown  that  a  three-dimensional  solution  provides  better 
agreement  with  measured  data  than  two-dimensional  modeling,  especially  in  cases  where  propagation  over 
rooftops  is  important.  In  general,  the  predicted  acoustic  responses  match  well  with  measured  results  for  the 
source/sensor  locations. 

6.2  Introduction 

Acoustic  propagation  in  urban  environments  is  a  physically  complex  problem  that  has  many  practical  appli¬ 
cations.  In  urban  planning  and  city  design,  acoustic  propagation  models  can  inform  decisions  on  the  location 
of  noise-sensitive  buildings  like  hospitals  and  schools  [78].  Accurate  computational  modeling  is  also  useful 
in  designing  baffles  near  areas  of  high  traffic  to  control  noise  levels  in  residential  neighborhoods  [96,  69]. 
Acoustic  modeling  is  also  useful  in  sound-source  localization:  numerous  sensors  are  placed  in  an  urban  en¬ 
vironment  to  detect  sound  events  and  calculate  the  sound  source’s  position  using  the  peak  arrival  times  of 
the  sound  waves.  This  computation  of  sound  source’s  position  can  be  used  for  gunshot  localization  for  crime 
control  in  urban  areas  [86]  and  in  many  military  applications  [4]. 

Acoustic  propagation  modeling  for  urban  areas  is  a  challenging  computational  problem  because  of  the 
complex  building  geometry  and  large  domain  size.  Higlr-order  diffraction  and  scattering  play  a  significant 
role  in  acoustic  energy  transport  in  urban  areas,  especially  in  cases  when  the  source  and  receiver  are  not 
in  line-of-sight.  Previous  work  in  the  field  has  mainly  focused  on  continuous  noise  sources  to  determine 
statistical  quantities  like  reverberation  time  and  noise  levels  [173,  29,  70].  However,  these  are  gross  acoustic 
parameters  and  do  not  give  a  detailed  view  of  the  actual  propagation.  Geometric  techniques  have  been 
used  to  evaluate  noise  levels  and  calculate  sound  propagation  in  urban  streets  [62,  77].  However,  due  to 
inherent  assumption  of  rectilinear  propagation  of  sound  waves,  modeling  wave  effects  such  as  diffraction  and 
interference  remains  a  significant  challenge  with  these  techniques. 
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Recent  work  in  numerical  techniques  has  focused  on  the  use  of  acoustic  pulse  propagation  techniques  in 
time  domain  to  get  detailed  characteristics  of  the  complex  propagation  effects  in  urban  scenes.  Time-domain 
pulse  propagation  is  preferred  in  urban  acoustic  modeling  as  it  gives  direct  insight  into  the  propagation 
by  producing  animations  of  pressure  wavefronts.  This  allows  one  to  quickly  inspect  the  propagation  path 
corresponding  to  dominant  peaks  in  the  response  at  a  given  sensor  location.  Recent  studies  used  a  finite- 
difference  time-domain  simulation  to  model  acoustic  pulse  propagation  and  compared  the  results  with  real- 
world  measurements  [90,  5,  3].  Those  studies  were  limited  to  2D  modeling  due  to  the  high  computational 
cost  and  memory  requirement  of  the  finite-difference  technique  for  this  large  domain  size. 

In  this  paper,  a  full  3D  analysis  of  acoustic  pulse  propagation  in  time  domain  is  presented.  Our 
analysis  is  made  on  a  virtual  3D  model  of  the  same  scene  as  the  prior  2D  investigation  [3] ;  this  3D  analysis 
is  made  computationally  feasible  by  using  Adaptive  rectangular  decomposition  (ARD)  [123],  an  efficient 
time-domain  numerical  solver,  allowing  us  to  model  propagation  in  this  scene  in  three  dimensions.  ARD 
is  much  more  compute-  and  memory-efficient  for  homogenous  media  than  the  finite-difference  time-domain 
technique.  The  improved  efficiency  allows  the  simulations  to  have  a  much  higher  usable  bandwidth,  up 
to  mid-range  frequencies  of  450  Hz,  compared  to  200Hz  in  prior  work  [3],  while  taking  just  a  few  minutes 
on  a  desktop  computer.  A  detailed  analysis  of  errors  between  measured  data  and  3D  simulated  data  is 
performed,  showing  that  3D  simulations  provide  better  agreement  with  measured  data  than  2D  simulations. 
The  agreement  is  markedly  better  in  cases  where  propagation  over  rooftops  is  important,  a  case  which  the 
2D  modeling  cannot  capture  at  all.  In  general,  the  predicted  acoustic  responses  match  well  with  measured 
results  for  most  source/sensor  locations,  with  typical  errors  being  on  the  order  of  3  dB.  Visualizations  of 
the  time-domain  simulation  show  that  a  rooftop-diffracted  path  provides  important  energy  contributions  at 
certain  locations  in  the  scene. 
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6.3  Previous  work 


Over  the  years,  many  techniques  have  been  developed  to  study  acoustic  propagation  in  urban  environ¬ 
ments  [78].  Analytical  solutions  are  available  for  simple  scenarios  involving  building  edges  and  noise  barri¬ 
ers  [116].  Theoretical  predictions  have  been  used  to  predict  the  noise  levels  in  urban  street  complexes  [87]. 
Statistical  analysis  [49,  114]  has  been  performed  on  measured  data  to  analyze  the  reverberation  time  and 
sound  levels  in  streets  and  to  study  the  relationship  between  different  noise  descriptors  in  urban  areas. 
Many  ray-tracing  based  approaches  [62,  70]  have  been  proposed  that  evaluate  the  increase  in  traffic  noise 
for  street  canyons  due  to  the  presence  of  buildings.  Kang  et  al. [77]  used  a  radiosity-based  model  to  calcu¬ 
late  sound  propagation  in  interconnected  urban  streets.  And  the  radiosity-based  model  has  been  combined 
with  the  image-source  method  to  handle  diffuse  and  geometrical  boundaries,  respectively,  for  street  canyon 
scenarios  [76]. 

Typical  numerical  approaches  utilized  to  study  urban  acoustic  propagation  are  Finite-Difference  Time- 
Domain  [23],  Finite  Element  Method  [152],  Boundary  Element  Method  [27],  Equivalent  Source  Method  [104] 
and  Pseudo-Spectral  Time-Domain  [25].  The  boundary  element  method  has  been  applied  to  acoustic  prop¬ 
agation  in  areas  with  noise  barriers  [120]  and  in  outdoor  scenes  [33].  To  model  sound  propagation  in  city 
canyons,  Ogren  and  Kropp  (2004)  used  the  equivalent  source  method  and  Van  Renterghem  et  al.  (2006)  used 
a  coupled  finite-difference-parabolic  equation  method  [108,  162],  Ovenden  et  al.  (2009)  coupled  the  analyt¬ 
ical  calculation  to  a  parabolic  equation  method  for  modeling  noise  propagation  in  urban  freeways  [111].  To 
model  atmospheric  sound  propagation,  a  pseudo-spectral  time-domain  (PSTD)  approach  [68]  was  proposed. 
The  finite-difference  approach  has  been  used  in  recent  years  to  model  acoustic  pulse  propagation  in  urban 
environments  and  compare  the  results  with  measured  waveforms  recorded  at  the  physical  site.  This  includes 
propagation  for  a  right-angled  wall  [90],  a  single  building  [5],  and  a  training  village  with  multiple  build¬ 
ings  [3].  However,  due  to  computational  limitations,  all  these  approaches  have  been  limited  to  propagation 
in  two  dimensions. 

Some  recent  studies  have  modeled  three-dimensional  sound  propagation.  Ketclram  et  al.  (2008)  used  a 
finite-difference  approach  for  modeling  the  effect  of  urban  infrastructure  on  sound  scattering  in  three  dimen- 
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sions  [80],  but  the  modeling  required  a  computer  cluster  with  hundreds  of  processors.  Polles  et  al.  (2004) 
proposed  a  diffusion-equation-based  approach  to  model  3D  sound  propagation  in  urban  areas  with  multiple 
buildings  [117].  Recently,  a  fast  and  efficient  time-domain  approach  was  proposed;  this  technique  called 
adaptive  rectangular  decomposition,  solves  the  wave  equation  in  three  dimensions  for  spatially  invariant 
speed  of  sound  [124,  100].  It  is  related  to  the  PSTD  technique  but  avoids  the  discrete  integration  in  time 
by  using  analytical  solutions  of  the  wave  equation  for  rectangular  domains.  For  a  more  detailed  survey  of 
outdoor  sound  propagation  techniques,  the  survey  paper  [174]  is  recommended. 

6.4  Measurements 

In  this  section,  we  discuss  the  real-world  measurements  used  in  the  validation  of  the  numerical  simulation. 
This  dataset  was  presented  in  the  work  of  [3].  We  provide  a  brief  discussion  here,  but  more  details  can  be 
found  in  their  previous  work. 

Scene  Layout  The  experiment  was  conducted  in  an  artificial  village  spanning  a  150  x  150  m2  area  with 
15  buildings  and  two  cross-streets:  “Main  street”  running  nearly  horizontal,  and  “Church  street”  running 
in  an  approximately  vertical  direction.  Figure  6.1(a)  shows  the  2D  layout  (top  view)  of  the  urban  scene. 
The  buildings  in  the  village  were  two  or  three  stories  tall  and  made  up  of  concrete  blocks.  The  ground  areas 
consisted  of  streets,  grass  areas,  and  hard-packed  soil. 

Weather  Conditions:  The  experiment  was  conducted  over  two  sunny  days  with  temperature,  wind 
and  relative  humidity  variation  between  8  to  19°  C,  2  to  5  m/s  and  30%-50%  respectively. 

Sources:  Acoustic  pulses  were  produced  by  using  small  explosives  of  0.57  kg  of  C4  suspended  at  a 
height  of  1.5  m  from  the  ground.  The  measurements  were  recorded  for  four  source  positions  SP1-SP4. 

Receivers:  Sensors  were  placed  at  14  different  receiver  positions  spread  throughout  the  scene,  in 
both  line-of-sight(LOS)  and  non-line-of-sight(NLOS)  positions.  These  sensors  were  connected  to  digital 
seismographs  that  recorded  the  pressure  signal  at  a  sampling  rate  of  5  or  8  kHz. 
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Figure  6.1:  a)  Top  view  of  the  urban  scene  used  in  the  experimental  study.  Reproduced  with  permission 
from  Liu  and  Albert  (2010).  b)  An  approximate  3D  model  of  the  scene  constructed  based  on  the  2D  layout, 
photographs  of  the  scene  and  heights  of  the  buildings  corners  and  roof  tops.  (Color  online.) 
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6.5  Adaptive  rectangular  decomposition  numerical  modeling 


In  this  section,  we  give  an  overview  of  the  adaptive  rectangular  decomposition  (ARD)  simulation  technique 
for  modeling  acoustic  pulse  propagation  [124,  100,  122]. 

6.5.1  The  adaptive  rectangular  decomposition  method 

Our  starting  point  is  the  wave  equation  for  constant  sound  speed, 

-c2V2p  =  /(x,t),  (6.1) 

where  p  (x,  t)  is  the  time- varying  pressure  field,  /  (x,  t)  is  the  force  term  corresponding  to  the  volume  sound 
sources,  c  is  the  speed  of  sound  and  V2  is  the  Laplacian  operator.  The  speed  of  sound  in  the  medium  is 
assumed  to  be  spatially  invariant. 

Figure  6.2  demonstrates  the  main  stages  of  the  ARD  pipeline.  The  technique  starts  with  a  3D  model 
of  the  scene,  voxelizes  the  air  volume,  and  then  decomposes  the  voxelization  into  rectangular  partitions. 
The  wave  equation  has  a  known  analytical  solution  for  rectangular  domains  for  spatially-invariant  speed  of 
sound.  Consider  a  rectangle  in  3D  of  size  (lx,  ly,  lz)  with  perfectly  reflecting  walls.  The  analytical  solution 
of  the  wave  equation  in  this  case  can  be  written  as: 

P(x,t)  =  ^2  "bh)$i(x),  (6.2) 

i=(ix  ) 

where  ix  are  x- indices  in  the  range  [1  —  lx\,  and,  iy  and  iz  are  y-and  z-indices,  respectively.  Here  i  = 
(■ ixiiyiiz )  is  a  generalized  index  over  three  dimensions,  m,(t)  are  time- varying  mode  coefficients,  and  $*(x) 
are  eigenfunctions  of  the  Laplacian,  given  by 

/  \  ( JTg  \  (  ll  'dy  \  ( n iz  \  . 

®»(x)  =  cos  y-j— x J cos  \-j—y J cos  \  T~ z J  ’  6'3 

for  a  perfectly-reflecting  boundary  condition.  In  discrete  interpretation,  pressure  can  be  transformed  into 
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Figure  6.2:  Different  stages  of  the  Adaptive  Rectangular  Decomposition  simulator.  (Color  online.) 
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mode  coefficients  Mi  at  each  time  step  by  a  Discrete  Cosine  Transform  (DCT),  since  the  eigenfunctions  are 
cosines.  The  update  rule  for  mode  coefficients  can  be  derived  by  taking  a  DCT  of  the  wave  equation  and 
solving  the  resultant  simple  harmonic  oscillator  system,  giving 

2  Wn 

M"+1  =  2M"  cos(wiAt)  —  M”_1  H - ^-(l  —  cos(wiAt)),  (6-4) 

where  the  superscript  indicates  the  number  of  time  steps,  wt  =  cttiJ (jf  +  ,  Fn  is  the  DCT  of  force 

/  (x,  t)  at  nth  time  step  and  At  is  the  size  of  the  time  step.  Mode  coefficients  are  then  transformed  back 
into  pressure  by  an  inverse  DCT.  This  gives  the  pressure  inside  each  rectangular  partition.  The  pressure 
is  propagated  across  neighbouring  partitions  by  performing  interface  handling  using  a  sixth-order  finite- 
difference  stencil.  To  incorporate  sound  absorption  at  the  partition  boundaries,  Perfectly  Matched  Layer 
(PML)  absorbing  boundary  conditions  are  used.  Currently,  the  ARD  simulator  can  handle  absorption  and 
reflection  of  sound  while  ignoring  transmission  through  objects.  For  more  details,  please  refer  to  the  original 
texts  [124,  100]. 

The  ARD  technique  is  more  efficient  than  the  FDTD  technique  because  of  its  larger  grid  spacing  and 
time  steps.  The  grid  spacing  for  the  ARD  technique  is  h  =  c/(r,maxs),  where  i<max  is  the  maximum  simulation 
frequency  and  s  =  2.6  is  number  of  samples  per  wavelength.  The  ARD’s  simulation  time  step  is  restricted 
by  the  Courant-Friedrichs-Lewy  (CFL)  condition  At  <  h/(c\f?>).  In  contrast,  the  FDTD  technique  requires 
a  much  higher  value  of  samples  per  wavelength  (s  =  10  used  in  Taflove  et.  al  [145]  or  s  =  20  used  in 
Albert  et.  al  [3]),  resulting  in  much  denser  grid  and  smaller  time  steps.  Therefore,  the  ARD  technique  is 
computationally  more  efficient  and  requires  less  memory  than  the  FDTD  technique.  This  efficiency  enables 
the  ARD  technique  to  perform  3D  wave-simulations  on  large,  complex  scenes  at  a  higher  simulation  frequency 
than  FDTD  can,  all  on  a  desktop  computer. 

The  ARD  technique  has  few  intrinsic  limitations,  the  primary  one  being  the  spatially  invariant  speed 
of  sound.  However,  sound  speed  can  change  spatially  due  to  many  factors  such  as  temperature  gradient, 
humidity,  or  wind.  The  ARD  technique  does  not  model  the  effect  of  these  factors  on  sound  propagation. 
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Another  limitation  is  that  atmospheric  absorption  is  currently  not  modeled  in  the  simulation.  Also,  the 
simulation  does  not  model  sound  transmission  through  the  objects  (walls,  buildings,  etc).  In  a  general 
scenario,  these  limitations  can  have  an  effect  on  the  quality  of  the  simulation  results.  However,  in  the  present 
study,  these  limitations  have  negligible  effect  on  the  prediction  quality.  The  sensors  are  placed  at  a  height 
of  1.5m  and  most  of  the  energy  recorded  at  these  sensors  is  due  to  the  acoustic  propagation  happening  close 
to  the  ground  (<  15m).  For  such  small  elevations,  the  temperature  gradient  of  the  atmosphere  is  negligible 
and  speed  of  sound  remains  unaffected  due  to  temperature  gradient.  The  wind  speed  is  also  very  low  (2-5 
m/s)  to  have  any  significant  effect  on  arrival  times  of  the  acoustic  pulses.  Also,  atmospheric  absorption  can 
be  safely  ignored  in  this  case  as  the  simulation  frequency  is  less  than  500Hz  and  the  propagation  distances 
are  of  the  order  of  hundreds  of  meters  for  which  the  intrinsic  absorption  of  the  atmosphere  is  negligible.  As 
for  sound  transmission,  due  to  the  very  high  impedance  contrast  between  the  air  and  the  concrete  buildings, 
very  little  acoustic  energy  can  get  transmitted  through  the  buildings. 

6.5.2  Validation 

We  provide  validation  results  of  the  ARD  technique  on  two  benchmark  test-cases:  (a)  spherical  wave  scat¬ 
tering  by  a  rigid  sphere  and  (b)  edge-diffraction  from  a  right-angled  rigid  wall.  In  the  first  case,  the  acoustic 
wave  equation  has  known  analytical  solution  [57].  The  scene  setup  is  as  follows:  a  sphere  of  radius  a  =  1  m, 
surrounded  by  air  with  speed  of  sound  343 m/s  and  mean  density  of  1.21  kg/mr,  is  centered  at  origin  (0,  0,  0). 
A  spherical  sound  source  (monopole  source)  is  placed  at  position  (0,0, —3).  The  spherical  wave  emitted  by 
the  source  is  scattered  by  the  rigid  sphere.  The  total  field  (incident -l-scattered  field)  is  computed  using  the 
analytical  solution  of  the  wave  equation  at  an  angular  distribution  of  listener  positions  at  distance  of  1.5m. 
The  analytical  solutions  are  compared  against  the  simulation  results  at  different  wave  numbers  k  as  shown 
in  Figure  6.3.  The  results  are  plotted  versus  the  polar  angle  9  where  9  =  180°  corresponds  to  the  front  end 
of  sphere  with  respect  to  the  incoming  spherical  wave.  The  comparison  between  the  analytical  expressions 
and  the  ARD  simulation  results  show  very  good  agreement. 

In  the  second  case,  we  perform  validation  of  the  ARD  technique  by  comparing  it  against  the  edge 
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diffraction  model  proposed  by  Svensson  et  al.  [144].  This  model  is  an  extension  of  Biot-Tolstoy-Medwin 
solution  [97]  to  finite  edges.  The  scene  setup  is  as  follows:  a  right-angled  rigid  wall  of  dimension  8 m  x  12m 
is  considered  with  the  longer  edge  being  the  diffraction  edge.  Source  and  receiver  are  placed  at  symmetric 
positions  with  respect  to  the  wall  at  (—1.8, —0.9, —6.0)  and  (0.9, 1.8, —6.0),  respectively.  The  time-  and 
frequency-domain  responses  are  computed  using  the  BTM  finite-edge  diffraction  model  and  compared  against 
the  results  of  the  ARD  simulation.  As  shown  in  Figure  6.4,  the  agreement  between  the  two  responses  is  very 
good. 

6.5.3  Source  waveform 

The  source  function  used  to  model  the  explosive  blast  signal  for  calculations  in  the  ARD  simulator  is  described 
in  [90].  Figure  6.5  shows  the  corresponding  source  function  with  peak  pressure  normalized  to  1. 

6.5.4  3D  model 

In  order  to  run  a  3D  numerical  simulation,  a  virtual  3D  model  of  the  scene  is  required.  However,  a  detailed 
3D  model  cannot  be  constructed  due  to  the  lack  of  availability  of  architectural  blueprints  or  a  laser-scanned 
point  cloud  of  the  site.  Therefore,  we  construct  a  simplified  3D  model  of  the  scene  using  a  2D  layout  of  the 
village,  photographs,  and  the  heights  of  corners  and  rooftops  of  buildings.  This  3D  model  is  an  approximation 
to  the  actual  geometry  of  the  scene,  since  it  lacks  particular  geometric  details  (peaked  roofs,  door/window 
locations,  facade  details,  and  extraneous  geometry  such  as  cars  and  a  fountain).  The  dimensions  of  the 
simulation  domain  are  I75m  x  140m  x  14m.  The  heights  of  the  building  are  between  6  —  9m.  Therefore, 
depending  on  the  building,  a  vertical  space  of  5  —  8m  is  above  the  roof  to  the  top  of  the  simulation  domain 
allowing  correct  simulation  of  rooftop  diffraction.  Figure  6.1(b)  is  a  textured  rendering  of  the  3D  model. 
Based  on  the  type  of  material  present  (e.g.  concrete,  grass,  soil,  etc.),  we  assign  the  appropriate  absorption 
coefficients  to  the  surfaces  of  the  3D  model. 
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6.5.5  Simulation  parameters 


The  ARD  simulation  was  run  with  an  acoustic  wave  velocity  of  375?ns^1  and  an  air  density  of  1.2kgm~3 . 
The  high  value  for  the  acoustic  wave  velocity  comes  from  the  propagation  of  the  high-amplitude  acoustic 
pulse  generated  by  the  C4  explosive  used  as  the  sound  source.  In  case  of  concrete,  the  acoustic  wave  velocity 
is  2950?ns_1  and  density  is  2300kgm~3 ,  which  results  in  a  reflection  coefficient  of  0.99  for  concrete.  These 
parameters  correspond  to  the  values  used  in  Albert  and  Liu  (2010)  for  the  finite-difference  simulation. 

We  run  the  ARD  simulator  to  propagate  the  acoustic  pulse  from  each  given  sound  source  position,  one 
by  one.  We  record  the  response  at  the  specified  receiver  positions  and  compare  the  results  with  the  recorded 
measurement  data. 

6.5.6  3D  vs  2D  wave  simulation 

We  discuss  the  advantages  of  the  3D  wave  simulator  over  the  2D  wave  simulator  for  acoustic  pulse  propaga¬ 
tion.  Firstly,  a  3D  wave  simulation  incorporates  propagation  paths  over  the  top  of  walls  or  buildings,  as  well 
as  wave  diffraction  from  the  upper  edges,  both  of  which  are  completely  ignored  by  a  2D  simulation.  Secondly, 
in  3D  simulation,  the  sound  reflection  from  the  ground  terrain  is  handled  accurately  for  all  frequencies.  For 
a  2D  simulation,  the  pressure  is  simply  doubled  to  approximate  ground  reflections,  which  is  accurate  only 
for  frequencies  up  to  600 Hz,  as  discussed  in  Liu  et  al.(2006).  Lastly,  the  results  of  a  2D  simulation  must 
be  renormalized  by  an  additional  factor  of  1/ y/r  to  account  for  3D  geometric  spreading.  This  normalization 
is  valid  only  for  large  kr  (where  k  is  the  wave  number  and  r  is  distance  to  the  source).  A  3D  simulation 
requires  no  such  normalization. 

6.6  Results 

In  this  section  we  compare  the  waveforms  calculated  by  the  ARD  simulator  with  the  measurements  recorded 
in  the  village,  and  with  the  2D  FDTD  simulation  proposed  by  Liu  and  Albert  (2010). 
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Table  6.1:  Parameters  for  the  ARD  technique. 


Parameters 

Values 

simulation  frequency 

450Hz 

grid  size 

175m  x  140m  x  14m 

grid  spacing 

0.31  m 

#  grid  points 

11  million 

time  step  size 

385  /is 

#  time  steps 

2000 

simulation  length 

0.77  s 

6.6.1  Simulation 

The  ARD  simulations  were  run  for  the  four  source  positions  SP1  to  SP4  shown  in  Figure  6.1(a).  The 
parameters  used  in  the  simulator  are  given  in  Table  6.1.  The  total  simulation  time  for  each  source  was  20  mins 
for  the  CPU-based  ARD  implementation  [124]  and  1  —  2  mins  for  the  GPU-based  ARD  implementation  [100]. 
The  CPU-based  implementation  is  C++  and  the  GPU-based  implementation  uses  NVIDIA’s  CUDA.  These 
timing  results  were  measured  on  a  single  core  of  a  4-core  2.80  GHz  Xeon  X5560  desktop  machine  with  4  GB 
of  RAM  and  on  a  NVIDIA  GeForce  GTX  480  GPU  with  448  CUDA  cores  and  1.5  GB  memory. 

The  ARD  responses  were  computed  for  four  source  positions,  up  to  a  maximum  frequency  of  450  Hz. 
The  measured  waveforms  were  low-passed  to  450  Hz  for  source  positions  2  and  3  to  compare  with  the 
calculated  ARD  waveforms.  For  source  position  1  and  4,  both  the  ARD  and  the  measured  responses  were 
low-passed  to  200  Hz,  since  the  2D  FDTD  waveforms  are  available  only  up  to  200  Hz.  In  this  scene,  the 
simulation  frequency  is  less  than  500 Hz  and  the  propagation  distances  are  of  the  order  of  hundreds  of  meters, 
for  which  the  intrinsic  absorption  of  the  atmosphere  is  negligible.  Therefore,  we  have  ignored  atmospheric 
absorption  during  the  simulation. 

Figure  6.6  shows  the  visualization  of  the  time-domain  ARD  simulation  at  specified  time  steps.  These 
visualizations  show  the  propagation  of  wave  fronts  in  the  scene  and  how  they  are  modified  by  the  multiple 
reflections  and  diffractions  from  the  buildings,  and  reflections  from  the  ground.  These  can  be  helpful  in 
guiding  engineering  modifications  to  the  scene. 
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6.6.2  Varying  propagation  speed 


In  the  training  village  scene  considered  in  this  study,  the  sound  source  used  is  a  C4  explosive.  The  high- 
amplitude  explosion  caused  by  this  source  generates  a  varying  sound-speed  profile.  Figure  5  in  Liu  and 
Albert  (2010)  shows  the  measured  values:  over  400  m/s  a  meter  or  two  away  and  375  m/s  at  about  20 
m  distance.  The  speed  varies  with  the  amplitude  of  the  traveling  pulse,  reducing  with  distance.  To  make 
things  more  complicated,  the  amplitude  of  a  wave  that  diffracts  around  a  building  corner  is  reduced,  and 
that  diffracted  wave  travels  slower  than  waves  at  the  same  propagation  distance  that  are  in  line-of-sight 
paths.  This  generates  a  complicated  wave-speed  profile  that  is  harder  to  model.  The  peak  arrival  times  can 
get  shifted  in  the  measured  waveform  as  compared  to  the  calculated  waveforms,  which  assume  a  constant 
speed  of  sound.  Also,  the  individual  peaks  can  get  stretched  due  to  decreasing  wave  speed.  Therefore,  there 
are  complicated  changes  in  acoustic  wave  speed  even  for  different  waves  at  the  same  location. 

Similar  to  the  2D  finite  difference  simulation  of  Albert  and  Liu  (2010),  our  3D  ARD  simulation  also 
did  not  allow  us  to  track  these  changes  with  amplitude,  one  would  have  to  follow  the  individual  wavefronts 
to  do  that  correctly,  or  modify  the  code  to  explicitly  include  non-linear  effects.  Instead,  we  chose  a  constant 
wave  speed  to  get  the  best  waveform  fits  when  compared  with  the  measured  data.  Also,  similar  to  Albert 
and  Liu  (2010),  we  perform  time-shifting  to  the  response  of  linear  simulation  to  align  the  first  arrival  peak 
with  the  measured  waveform.  In  addition  to  that,  we  try  to  account  for  these  kinematic  errors  due  to  the 
varying  sound  speed  with  the  “robust”  error  metric  discussed  later. 

6.6.3  Error  metrics 

Basic  error  metric 

In  order  to  perform  a  quantitative  evaluation  between  the  measured  and  the  calculated  waveforms,  two 
types  of  error  metrics  are  used  in  the  comparison:  the  spectrogram  difference  metric  and  the  average  dB 
error  metric.  The  spectrogram  difference  metric  (SDM)  is  computed  on  the  spectrograms  SPEC  of  the 
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time-domain  pressure  signals  {a}^1  and  {5}^: 


SDM(a,  b) 


Ejii  ELi  II SPEC(a)jk  -  SPEC{b)jk\\2 
T7=iT.Li\\SPEC{a)jkr 


(6.5) 


where  M  =  {N / 2  if  N  is  even  or  N/2  +  1  if  odd}  and  T  is  the  number  of  time  segments  in  spectrogram. 

The  average  dB  error  metric  (ADM)  is  computed  on  the  pressure  signals  in  the  dB  scale  {dB(a)}P=1 
and  {dB(b)}^L1  as  follows: 

N 

ADM  (a,  b)  =  J2  II  dB(a)i  -  dB(b)i\\/N.  (6.6) 

i— 1 

Robust  error  metric 

The  error  metrics  defined  above  are  very  sensitive  to  the  time  of  arrival  in  the  waveforms.  In  scenarios 
where  the  speed  of  sound  is  constant,  these  metrics  perform  well.  However,  in  this  case  (as  discussed  in 
Section  6.6.2)  where  the  speed  of  sound  varies,  the  arrival  times  can  be  off  by  few  milliseconds.  Even  though 
this  small  time-shift  might  not  cause  a  big  difference  in  individual  waveform  characteristics  such  as  shape, 
frequency  content,  etc,  it  can  generate  a  large  error  with  these  error  metrics.  One  can  think  of  using  regular 
cross  correlation  to  remedy  this  situation.  However,  regular  cross  correlation  will  match  the  peak  arrival  but 
will  misalign  the  rest  of  the  measured  waveform. 

We  propose  changes  to  the  above  error  metrics  that  will  make  them  more  robust  to  the  small  time- 
shifts  and  peak  stretching  caused  by  varying  sound  speeds.  Our  proposed  change  is  based  on  the  idea  of 
Dynamic  Time  Warping  (DTW),  which  is  a  standard  tool  in  signal-processing  community  to  handle  non¬ 
linear  transformations  in  the  time  axis.  In  the  DTW  technique,  the  two  input  time  signals  are  allowed  to 
shift,  stretch,  or  contract,  in  a  limited  manner  to  generate  the  most  optimal  matching  between  the  signals. 
This  technique  has  been  widely  used  in  the  areas  of  speech  processing  [132],  acoustics  [95],  bioinformatics  [1], 
and  medicine  [30]. 

In  our  proposed  solution,  we  align  the  first  arrival  peaks  of  the  calculated  and  measured  waveforms. 
This  removes  any  time-shift  that  happens  before  the  arrival  of  the  first  peak.  Next,  we  take  these  signals 
and  apply  the  DTW  technique  to  align  the  remaining  part,  thus  taking  into  account  small  time  shifts 


142 


and  stretching.  Finally,  the  above  error  metrics  are  applied  on  these  aligned  signals  to  give  a  quantitative 
measure  of  the  error;  we  also  define  at  the  same  time  the  confidence  we  have  in  the  resulting  measure.  The 
confidence  measure  is  based  on  the  intuition  that  the  less  DTW  warping  required  to  align  the  signals,  the 
more  confidence  we  have  in  the  similarity  of  the  signals. 

Confidence  =  (1  —  dw/l0)  x  100  (in  percent), 

where  dw  =  \lw  —  lQ\  is  the  difference  in  the  length  of  the  warped  signal  lw  and  the  original  signal  la.  For  all 
the  results  shown  in  the  paper,  we  allow  a  warping  length  change  of  only  5  —  10%,  resulting  in  a  confidence 
measure  of  90  —  95%. 

6.6.4  Comparison  with  measurements  in  time  domain 

In  this  section,  we  compare  the  waveforms  calculated  using  the  ARD  simulation  with  the  measured  waveforms 
for  different  source  and  receiver  positions.  Note  that  waveform  modeling  requires  a  strong  agreement  between 
the  amplitudes  and  phases  of  the  calculated  and  measured  waveforms,  making  it  a  stringent  test  for  any 
acoustic  pulse  propagation  technique. 

Figure  6.8  compares  the  ARD  waveforms  and  the  measurements  for  the  source  SP2  and  the  receiver 
positions.  The  upper  traces  in  each  panel  correspond  to  the  calculated  ARD  waveforms,  and  the  lower 
traces  correspond  to  the  measurements.  The  source  is  located  to  the  left  of  the  narrow  street  canyon  formed 
between  buildings  A  and  E  (see  Figure  6.1(a)).  For  all  the  line-of-sight  (LOS)  positions  (R01  thru  R06),  we 
get  an  excellent  match  between  the  calculated  and  measured  waveforms  for  the  direct  sound  (first  arrival) 
and  the  subsequent  reflections.  The  main  characteristic  of  LOS  responses  is  the  strong  first  arrival  which 
dominates  the  signal,  followed  by  (comparatively)  weaker  reflections.  For  receivers  very  close  to  the  source 
(R01  thru  R04),  the  high  amplitude  of  direct  sound  completely  dominates  the  later  reflections,  whereas  for 
receivers  far  away  (R05,  R06),  the  reflections  have  considerable  energy. 

For  non-line-of-sight  (NLOS)  positions,  the  waveform  characteristic  is  position-dependent  and  more 
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complex.  In  the  case  of  receiver  R20,  the  propagation  paths  mainly  consist  of  diffraction  around  and  from 
the  top  of  building  E.  For  receiver  R08,  the  diffracted  first  arrival  is  followed  by  high-order  reflections  from 
buildings  C,  H,  D  and  I.  In  the  case  of  receiver  R15,  diffracted  arrivals  around  and  from  the  top  of  buildings 
A  and  B  are  followed  by  high-order  reflections  and  diffractions  from  A,  F  and  B.  Similarly,  for  R09,  the 
first  arrival  corresponds  to  diffraction  from  the  rooftop  of  building  A  followed  by  high-order  reflections  and 
diffractions  from  buildings  A  and  E.  For  receivers  R17  and  R18  the  first  arrival  is  via  diffraction,  followed 
by  multiple  reflections  trapped  between  buildings  F  and  G.  In  the  case  of  R17,  the  diffraction  angle  is  10-15 
degrees,  resulting  in  a  high  amplitude  diffraction  peak;  in  R18,  by  contrast,  the  diffraction  angle  is  90  degrees, 
which  results  in  a  low  amplitude  diffraction  peak. 

As  can  be  seen  in  Figure  6.8,  the  calculated  waveforms  incorporate  all  the  features  and  match  with 
the  measured  waveforms  to  a  high  degree  of  accuracy.  The  biggest  mismatch  between  the  waveforms  is  for 
sensor  R12,  which  is  a  NLOS  position  around  the  corner  of  the  building  A  on  Main  Street.  In  this  case,  the 
ground  floor  room  at  the  corner  had  two  open  windows  facing  Main  Street,  and  one  open  window  around 
the  corner  between  Main  Street  and  the  sensor  at  R12.  This  resulted  in  a  shorter  path  through  that  room 
to  the  sensor  for  sound  coming  from  source  positions  SP2.  This  is  probably  what  can  be  seen  before  the 
large  arrival,  which  presumably  diffracted  around  the  building  corner  itself.  The  open  windows  had  no  glass 
at  all  and  were  about  2x4  feet  in  area,  so  this  small  size  (compared  to  a  wavelength  of  about  7  meters 
at  the  source)  would  reduce  the  amount  of  energy  traveling  on  that  shorter  “indoor”  path.  Some  of  the 
additional  high  frequency  arrivals  later  on  in  the  measured  waveform  may  be  caused  by  reverberation  inside 
that  room.  Due  to  lack  of  availability  of  window  positions  data,  these  were  not  included  in  the  virtual  3D 
model  constructed  for  the  ARD  simulation.  Therefore,  these  early  arrivals  are  not  modeled  in  the  simulation 
results.  Same  behavior  is  observed  for  source  positions  SP1  and  SP3. 

In  terms  of  basic  error  metric  (not  shown  in  the  figure),  the  highest  value  for  this  source  simulation 
occurs  at  receiver  R18  (basic  SDM  error=1.67)  due  to  a  decrease  in  the  wave  speed  after  the  first  diffraction 
from  building  F.  This  causes  subsequent  strong  reflection  from  the  opposite  building  G  to  arrive  much  later  in 
time  than  the  calculated  waveform  (which  assumes  a  constant  sound  speed  STBms^1).  This  time  stretching 
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cannot  be  modeled  by  a  constant  time  shift  applied  at  the  beginning  of  the  signal.  Thus,  the  measured 
waveforms  for  R18  appear  to  be  the  stretched  equivalents  of  the  calculated  waveforms;  this  results  in  a 
higher  error  using  the  basic  metric.  The  DTW-based  robust  error  metric  takes  this  stretching  into  account, 
correctly  predicting  a  low  error  as  shown  in  the  figure.  Similar  behavior  can  be  observed  for  receivers  R16 
and  R17. 

In  Figure  6.9,  we  perform  the  same  comparison  between  calculated  and  measured  waveforms  for  the 
source  position  SP3.  All  the  LOS  positions  (R01  to  R06,  and  R20)  exhibit  excellent  matches  between  the 
calculated  and  measured  waveforms.  For  NLOS  positions  involving  first-order  diffraction  (i.e.  R09),  the 
calculated  responses  incorporate  diffraction  paths  from  both  around  and  on  top  of  the  buildings  to  generate 
the  correct  waveform.  For  R08,  the  first  diffraction  arrival  is  weaker  than  the  later  reflection;  this  behavior 
is  corrected  modeled  by  the  calculated  ARD  response.  As  described  before,  the  waveforms  of  receivers  R16, 
R17,  R18  get  stretched  in  time  by  the  variable  acoustic  speed;  this  stretching  results  in  high  error  with 
the  unwarped  metric  and  low  error  with  DTW-based  metric.  Aside  from  sensor  R12,  the  biggest  mismatch 
between  the  waveforms  is  for  receiver  R15  for  which  the  calculated  waveform  shows  two  distinct  peaks  as 
compared  to  only  one  peak  for  the  measured  waveform.  One  possible  explanation  is  that  the  arrival  times 
between  the  two  diffraction  peaks  shown  in  the  calculated  response  is  smaller  in  the  real  scene  (due  to  varying 
speed  of  sound),  resulting  in  a  constructive  interference  and  peak  merging  in  the  measured  waveform. 

6.6.5  Comparison  with  2D  FDTD 

In  this  section,  we  compare  the  calculated  ARD  and  FDTD  waveforms  to  the  measured  waveforms  for  two 
source  positions,  SP1  and  SP4.  The  ARD  waveforms  are  calculated  by  running  a  3D  simulation  on  the  3D 
model.  The  FDTD  waveforms  are  calculated  by  running  a  2D  finite-difference  simulation  on  a  2D  grid  as 
described  in  Liu  and  Albert  (2006). 

Figure  6.7  shows  the  calculated  and  measured  waveforms  for  the  source  position  SP1  and  its  receiver 
positions.  The  upper  traces  in  each  panel  correspond  to  the  calculated  ARD  response,  the  middle  trace  to 
the  measured  waveform,  and  the  lower  traces  to  the  calculated  FDTD  responses.  In  case  of  LOS  positions 
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(R01  to  R06),  the  dominant  propagation  happens  in  the  XY  plane  containing  the  sources  and  receivers. 
Therefore,  the  waveforms  calculated  using  the  3D  ARD  simulation  and  the  2D  FDTD  simulation  match 
equally  well  to  the  measured  waveforms.  The  main  difference  between  a  fully  3D  and  a  2D  simulation  arises 
in  cases  where  the  sound  waves  diffract  from  the  rooftops  of  the  buildings,  resulting  in  shorter  propagation 
paths  and  higher  energy  (as  illustrated  in  Figure  6.10).  For  receiver  R09,  the  diffraction  path  from  the 
top  of  building  A  is  the  shortest  path  and  corresponds  to  the  first  arrival.  This  shortest  path  is  modeled 
correctly  by  the  3D  ARD  simulation.  In  the  case  of  receiver  R20,  the  secondary  arrival  peak  corresponds  to 
the  rooftop  diffraction  path,  which  is  missing  in  the  2D  simulation  waveform.  For  receiver  R15,  the  energy 
from  rooftop  diffraction  paths  is  missing  from  the  2D  simulation  but  not  from  the  3D  simulation.  Therefore, 
for  these  NLOS  cases,  the  3D  simulation  waveforms  match  far  better  with  the  measured  waveforms  than  the 
waveforms  from  the  2D  simulation. 

In  Figure  6.11,  we  do  a  similar  comparison  for  source  position  SP4.  This  source  is  positioned  outside 
the  main  village  compound,  and  most  of  the  receiver  positions  are  non-line-of-sight.  The  only  LOS  position 
is  receiver  R20,  where  both  the  calculated  waveforms  match  well  with  the  measured  waveforms.  For  NLOS 
positions,  the  measured  waveforms  are  again  stretched  as  compared  to  the  calculated  waveforms,  as  discussed 
before.  The  speed  of  sound  reduces  significantly  after  diffraction,  resulting  in  high-order  propagation  peaks  to 
arrive  later  in  the  measured  waveforms  than  in  ARD  and  FDTD  waveforms  (which  assume  constant  speed  of 
sound).  In  the  case  of  receivers  R04  to  R06  and  R15  to  R18,  the  correct  modeling  of  rooftop  diffraction  with 
a  3D  simulation  (ARD)  results  in  a  better  match  with  the  measured  waveforms.  The  2D  FDTD  simulation 
ignores  these  paths,  resulting  in  a  lower  first  arrival  energy  than  in  the  measured  data.  The  measurement 
data  for  receiver  position  R1  is  not  available  for  the  source  position  SP4.  As  described  in  Liu  and  Albert 
(2010),  this  location  was  still  fitted  with  the  high-pressure  blast  sensor  from  the  previous  measurement, 
when  it  was  measuring  the  pressure  from  the  nearby  explosive  charge  at  SP1.  This  high-pressure  sensor  was 
unable  to  detect  the  low  pressure  waveform  produced  by  the  distant  source  SP4. 
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6.6.6  Comparison  with  measurements  in  spatial  domain 


One  of  the  primary  advantages  of  a  time-domain  wave  simulation  (FDTD  or  ARD)  is  the  ability  to  save 
snapshots  of  the  pressure  field  at  any  time  step  in  the  simulation.  These  snapshots  can  be  assembled  into  a 
movie  to  elucidate  wave-field  evolution  in  time  as  the  acoustic  pulse  travels  through  the  environment.  This 
movie  can  serve  as  a  useful  tool  for  studying  in  detail  the  complex  wave-interactions  involved  in  the  acoustic 
pulse  propagation. 

As  an  example,  wave-field  snapshots  validate  the  presence  of  rooftop  diffraction  paths  in  both  measured 
and  3D  ARD  waveforms  for  sensor  R09  and  source  SP1  (see  Figure  6.12).  In  Figure  6.10  (upper  trace), 
the  first  arrival  for  3D  ARD  and  the  measured  waveforms  happens  at  t  =  61  ms.  The  wave-field  snapshot 
in  Figure  6.12  at  t  =  61  ms  shows  that  the  corresponding  propagation  path  is  a  rooftop  diffraction  path 
from  the  top  of  building  A,  followed  by  a  diffraction  from  the  side  of  the  building  at  t  =  69  ms.  The  2D 
FDTD  simulation  cannot  model  3D  rooftop  propagation  paths  and  misses  the  energy  corresponding  to  this 
path.  Late  arrivals  in  the  waveform  correspond  to  high-order  reflections  between  building  A  and  E  that 
get  diffracted  around  the  side  or  the  rooftop  of  building  A  before  reaching  the  sensor.  These  correspond  to 
peaks  at  t  =  83, 104  and  177  ms,  which  are  marked  by  circles  on  top  the  waveforms  in  Figure  6.10.  These 
peaks  are  correctly  modeled  by  the  3D  ARD  simulation,  as  shown  by  the  wave-field  snapshots  for  these  times 
in  Figure  6.12.  The  wave-field  snapshots  and  movies  can  thus  provide  a  more  thorough  understanding  of 
acoustic  pulse  propagation. 

Figure  6.13  shows  the  variation  of  error  with  distance  between  the  source-receiver  positions.  The  error 
seems  to  be  independent  of  the  source-receiver  distance.  Note  that  error  values  are  much  higher  for  source 
position  SP4  than  others.  This  is  due  to  the  presence  of  more  NLOS  positions  in  SP4,  which  typically 
have  more  complex  propagation  characteristics  than  LOS  positions.  The  top  three  sensors  with  consistently 
high  errors  across  all  sources  are  R12,  R16  and  R15.  As  discussed  before,  R12  has  high  errors  due  to  open 
windows  in  building  A,  resulting  in  shorter  propagation  paths  that  are  not  modeled  in  the  simulation.  As 
for  sensor  R16  and  R15,  these  are  NLOS  positions  for  all  sources,  which  typically  have  higher  errors. 


147 


6.7  Conclusion,  Limitations  and  Future  Work 


In  this  paper,  acoustic  pulse  propagation  results  are  presented  for  a  large  urban  environment  in  three  di¬ 
mensions.  The  results  of  the  3D  simulation  provide  better  agreement  with  the  measured  data  than  the 
2D  simulation,  especially  in  cases  where  rooftop  diffraction  is  involved.  This  technique  enables  acoustic 
propagation  in  a  three-dimensional  scene  with  large  size  and  broad  frequency  range  on  a  desktop  computer. 

In  future,  we  would  like  to  explore  the  acoustic  pulse  propagation  in  three  dimensions  in  the  kHz 
range.  This  would  require  a  very  accurate  geometric  description  of  the  scene  (sub-meter  accuracy)  and 
parallelization  of  the  ARD  technique  on  a  computer  cluster.  Secondly,  we  plan  to  investigate  the  benefit 
of  a  full  3D  simulation  for  time  reversal  processing  [4]  to  compute  the  source  location  given  the  recorded 
waveforms  at  sensor  positions.  Lastly,  we  also  plan  to  study  the  transmission  of  sound  through  buildings  to 
determine  the  noise  levels  inside  the  buildings  from  exterior  sources. 


148 


Figure  6.3:  Validation  of  the  ARD  simulation  results  (dots)  with  analytical  expressions  (curves)  for  the 
scattering  of  a  spherical  wave  by  a  rigid  sphere.  Normalized  pressure  is  plotted  in  the  radial  axis.  The  radius 
of  the  sphere  a  =  1  m  and  wave  numbers  k  considered  are  0.7,  1.3  and  2  to-1,  respectively.  (Color  online.) 
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Figure  6.4:  Time  and  frequency  responses  produced  by  the  Biot-Tolstoy-Medwin  diffraction  model  (reference) 
and  the  ARD  simulation  for  a  right-angled  rigid  wall.  (Color  online.) 
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Figure  6.5:  The  source  pulse  used  for  modeling  the  blast  signal  produced  in  the  experiment  as  calculated 
from  Eq.(5)  (Liu  and  Albert,  2006). 
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Figure  6.6:  Calculated  acoustic  response  for  the  source  position  SP1  in  the  artificial  village  scene  using  the 
ARD  technique.  Simulated  wavefields  are  shown  at  times  t=75,  150,  225,  and  300  ms. 
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Figure  6.7:  Waveforms  calculated  by  the  3D  ARD  (upper)  and  2D  FDTD  (lower)  simulations  and  measured 
(middle)  in  the  artificial  village  for  source  position  1  at  14  receiver  positions.  All  the  responses  have  been 
individually  normalized  and  low-passed  to  the  maximum  frequency  of  200  Hz.  The  error  between  the  2D 
FDTD  simulation  and  measurement  (dashed  line)  and  3D  ARD  simulation  and  measurement  (solid  line)  has 
been  calculated  using  the  DTW-based  SDM  and  ADM  metric. 
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Figure  6.8:  Waveforms  calculated  by  the  3D  ARD  simulator  (upper)  and  measured  (lower)  in  the  artificial 
village  for  source  position  2  at  14  receiver  positions.  All  the  waveforms  have  been  individually  normalized 
and  low-passed  to  the  maximum  frequency  of  450  Hz.  The  error  between  the  3D  ARD  simulation  and 
measurement  has  been  calculated  using  the  DTW-based  SDM  and  ADM  metric. 
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Figure  6.9:  Waveforms  calculated  by  the  3D  ARD  simulator  (upper)  and  measured  (lower)  in  the  artificial 
village  for  source  position  3  at  14  receiver  positions.  All  the  responses  have  been  individually  normalized 
and  low-passed  to  the  maximum  frequency  of  450  Hz.  The  error  between  the  3D  ARD  simulation  and 
measurement  has  been  calculated  using  the  DTW-based  SDM  and  ADM  metric. 
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Figure  6.10:  Comparison  between  the  calculated  and  measured  waveforms  for  the  source  position  SP1  and 
receiver  R09  behind  building  A.  The  upper  trace  (solid  line)  corresponds  to  the  3D  ARD  waveform  whereas 
the  lower  trace  (solid  line)  corresponds  to  the  2D  FDTD  waveform.  The  measured  waveform  is  drawn  as 
dotted  line.  Note  that  the  2D  FDTD  simulation  cannot  model  the  diffraction  path  from  the  building’s 
rooftop  resulting  in  the  missing  first  arrival  at  60  ms. 
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Figure  6.11:  Waveforms  calculated  by  the  3D  ARD  (upper)  and  2D  FDTD  (lower)  simulations  and  measured 
(middle)  in  the  artificial  village  for  source  position  4  at  14  receiver  positions.  All  the  responses  have  been 
individually  normalized  and  low-passed  to  the  maximum  frequency  of  200  Hz.  The  error  between  the  2D 
FDTD  simulation  and  measurement  (dashed  line)  and  3D  ARD  simulation  and  measurement  (solid  line)  has 
been  calculated  using  the  DTW-based  SDM  and  ADM  metric. 
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Figure  6.12:  Rooftop  diffraction  Calculated  acoustic  response  for  the  source  position  SP1  in  the  artificial 
village  scene  using  the  ARD  technique.  Receiver  position  R09  is  marked  by  a  white  circle  on  the  right  of  the 
first  building  (from  right).  Simulated  wavefields  are  shown  at  times  t  =  20,  30,  40,  50,  61,  83,  104,  161  and 
177  ms  corresponding  to  marked  points  in  Figure  6.10. 
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Figure  6.13:  Variation  of  error  with  distance  between  the  source-receiver  positions. 
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